Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Partek Flow tutorials provide step-by-step instructions using a supplied data set to teach you how to use the software tools. Upon completion of each tutorial, you will be able to apply your knowledge in your own studies.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This tutorial gives an overview of RNA-Seq analysis with Partek Flow. It will guide you through creating an RNA-Seq analysis pipeline. The goals of the analysis are to create a list of differentially expressed genes, visualize these gene expression signatures by hierarchical clustering, and interpret the gene lists using gene ontology (GO) enrichment.
This tutorial will illustrate:
This tutorial uses a subset of the data set published in Xu et al. 2013 (PMID: 23902433). In the experiment, mRNA was isolated from HT29 colon cancer cells treated with the drug 5-aza-deoxy-cytidine (5-aza) at three different doses: 0μM (control), 5μM, or 10μM. The mRNA was sequenced using Illumina HiSeq (paired end reads). The goal of the experiment was to identify differentially expressed genes between the different treatment groups.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Partek Flow Metadata Tab has an option to import data, and is where sample/cell attributes are managed. This is also where users can modify the location of the project output folder.
The Metadata tab can be used to import data. To add samples to the project, click Add data under Import, different import options are displayed using the cascading menu (Figure 1).
This method adds samples by creating them simultaneously as the data gets imported into a project. The sample names are assigned automatically based on filenames.
Before proceeding, it is ideal that you have already transfer the data you wish to analyze in a folder (with appropriate permissions) within the Partek Flow server. Please seek assistance from your system administrator in uploading your data directly.
Select the Automatically create samples from files button. The next screen will feature a file browser that will show any folders you have access to in the Partek Flow server (Figure 2). Select a folder by clicking the folder name. Files in the selected folder that have file formats that can be imported by Partek Flow will be displayed and tick-marked on the right panel. You can exclude some files from the folder by unselecting the check mark on the left side of the filename. When you have made your selections, click the Create sample button.
Alternatively, files can also be uploaded and imported into the project from the user's local computer -only use this option if your file size is less than 500MB. Select the My computer radio button (Figure 3) and the options of selecting the local file and the upload (destination) directory will appear. Only one file at a time can be imported to a project using this method.
Multiple data files can be compressed a single .zip file before uploading. Partek Flow will automatically unzip the files and put them in the upload directory.
Please be aware that the use of the method illustrated in Figure 3 highly depends on the speed and latency of the Internet connection between the user's computer and the server where Partek Flow is installed. Given the large size of most genomics data sets, is not recommended in most cases.
After successful creation of samples from files, the Data tab now contains a Sample management table (Figure 4). The Sample name column in the table is automatically generated based on the filenames and the table is sorted in alphabetical order.
Clicking the on the** Show data files** link on the lower right side of the sample management table will expand the table and reveal the filenames of the files associated with each sample. Conversely, clicking on Hide data files will hide the file information.
The columns in the expanded view show the files associated with each sample. Files are organized by file type. Any filename extensions that indicate compression (such as .gz) are not shown.
Once a sample is created in a project, the files associated with it can be modified. In the expanded view, mouse over the +/- column of a sample. The highlighted icons will correspond to the options for the sample on that row.
Samples can be added one at a time by selecting the Create a new blank sample option (Figure 5). In the following dialog box, type a sample name and click Create. This process creates a sample entry in the sample management table but there is no associated file with it, hence it is a "blank sample."
Expanding the Sample management table by clicking Show data files on the lower left corner of the table will reveal the option to associate files to the blank sample.
Alternatively, if you have a matrix of data, such as raw read count data in text format, select Import count matrix. The requirements of this text file are listed below:
The file contains numeric values in a tab-delimited format, samples can be on rows while features (e.g.gene names) are in columns, or vice versa
The file contains unique sample IDs and feature IDs
If the data contains sample attribute information, all these attributes have to be ether
The leftmost columns when samples are on rows (Figure 6)
The first few rows when samples are on columns (Figure 7)
Like all other input files, you can upload the file from the Partek Flow server, My Computer or via a URL. Uploading the file brings up a file preview window (Figure 8). The preview of the first few rows and columns of the text file should help you determine on which rows/columns the relevant counts are located (the preview will display up to 100 rows and 100 columns). Inspect the text preview and indicate the orientation of the text file under File format>Input format.
If the read counts are based on a compatible annotation file in Partek Flow, you can specify that annotation file under Gene/feature annotation. Select the appropriate genome build and annotation model for your count data. Select the Contain sample attributes checkbox if your data includes additional sample information.
The example above is showing an example text file with samples listed on rows. The gene ID is compatible with the hg19 RefSeq Transcripts - 2016-08-01 annotation model. Under the Column information and Row information sections, indicate the location of the Sample ID, which in this case is on Column 1. Indicate the sample attribute location by marking where it starts, which in the example is at Column 2. Mark the Feature ID, which in this case are gene IDs and starts at Column 4.
If the data has been log transformed, specify the base under Counts format.
The project output directory is the folder within the Partek Flow server where all output files produced during analysis will be stored.
The default directory is configured by the Partek Flow Administrator under the Settings menu (under System Preferences > Default project output directory).
If the user does not override the default, the task output will go to a subdirectory with the name of the Project.
After samples have been added in the project, additional information about the samples can be added. Information such as disease type, age, treatment, or sex can be annotated to the data by assigning the Attributes for each sample.
Certain tasks in Partek Flow, such as Gene-Specific Analysis, require that samples be assigned attributes in order to do statistical comparisons between groups or samples. As attributes are added to the project, additional columns in the sample management table will be created.
Attributes can be managed or created within a project. Under the Data tab, click the button to open the Manage attributes page (Figure 9).
To prepare for later data analysis using statistical tools, attributes can either be categorical or numeric (i.e., continuous).
For categorical attributes, there are two levels of visibility. Project-specific categorical attributes are visible only within the current project. System-wide categorical attributes are visible across all the projects within the Partek Flow server, and are useful for maintaining uniformity of terms. Importing samples in a new project will retain the system-wide attributes, but not the project-specific attributes.
A feature of Partek Flow is the use of controlled vocabulary for categorical attributes, allowing samples to be assigned only within pre-defined categories. It was designed to effectively manage content and data and allow teams to share common definitions. The use of standard terms minimizes confusion.
To add a categorical attribute in the Manage attributes page, click the Add new attribute (Figure 10). In the dialog box, type a Name for the attribute, select the Categorical radio button next to Attribute type, select the visibility of the attribute and then click the Add button.
Repeat the process for additional attributes of the samples in your study. When done, click Back to sample management table. Categorical attributes will default to Project-specific visibility.
Click an attribute name to drag and drop can change the order of the attributes displayed on the data tab. Click on group name to drag and drop vertically can change the order of the group name, which can be reflected on visualization.
To add a numeric attribute in the Manage attributes page, click the Add new attribute. In the dialog box (Figure 13), type a Name for the attribute, select the Numeric radio button next to Attribute type, and then click the Add button. Some optional parameters for numeric attributes include the Minimum value, Maximum value, and Units. When done, click Add to return to the Manage attributes page. Repeat the process add more numeric attributes. When done, click Back to sample management table.
Since system-wide attributes do not have to be created by the current user, they only need to be added to the sample management table in a project.
In the Data tab, click Add a system-wide attribute button. In the dialog box that follows (Figure 14), a drop down menu is located next to Add attribute where you can select the System-wide attribute you would like to add to the project. Once selected, it will be recognized automatically as either Categorical, system-wide or the Numeric attribute.
For an System-wide categorical attribute, the different categories are listed and you have the option of pre-filling the columns with N/A (or any other category within the attribute). Click Add column and you will return to the Data Tab.
After adding all the desired attributes to a project, the sample management table will show a new column for each attribute (Figure 15). The columns will initially as "N/A", as the samples have not yet been categorized or assigned a value. To edit the table, click Edit attributes. Assign the sample attributes by using a drop down for categorical attributes (controlled vocabulary) or typing with a keyboard for numeric attributes.
When all the attributes have been entered, click Apply changes and the sample management table will be updated. After editing the sample table, make sure there are no fields with blank or N/A values before proceeding. To rename or delete attributes, click Manage attributes from the Data tab to access the Manage attributes page.
Another way to assign attributes to samples in the Data tab is to use a text file that contains the table of attributes and categories/values. This table is prepared outside of Partek Flow using any text editing software capable of saving tab-separated text files.
Using a text editor, prepare a table containing the attributes. An example is shown in Figure 16. There should only be one tab between columns with no extra tabs after the last column. In this particular example, the first column contains the filename and the text file is saved as Sampleinfo.txt.
The first row of the table in the text file contains the attributes (as headers). The first column of the table in the text file, regardless of the header of the first column, should contain either the sample names or the file names of the samples already added in Partek Flow. The first column is the unique identifier that will match the samples to the correct values or categories.
To upload sample attributes, click Assign sample attributes from a file in the Data tab. Then indicate where the attribute text file is stored and navigate to it. Partek Flow will parse the text file and present attributes that will be available for import (Figure 17).
Select the attributes you want to import by clicking the Import check box. Imported attributes that do not currently exist in the project will create new project-specific attributes.
You can change the name of a specific attribute by editing the Attribute name text box. Columns containing letter characters are automatically selected as categorical attributes. Columns containing numbers are suggested to be numeric attributes and can be changed to categorical using the drop down menu under Attribute type.
The first column is always the unique identifier and can refer only to File names or Sample names.
If using Sample names in the first column, they must match the entries of the Sample name column in the Sample management table.
If using File names in the first column, use the filenames shown in the fastq column of the expanded sample management table (see Figure 4) then add the extension .gz. All filenames must include the complete file extension (e.g., Samplename.fastq.gz).
The header name of the first column of the table (top left cell of our text table) is irrelevant but should not be left blank. Whether the first column contains File names or Sample names will be chosen during the process.
The last column cannot have empty values
Missing data (blank cells) can only be handled if the attribute is numeric. If it is categorical, please put a character in it.
It is advisable to use Sample name as the first column identifier when:
Samples are associated with more than one file (for instance, paired-end reads and/or technical replicates).
The files were imported in the SRA format (from the Sequence Read Archive database). In Partek Flow, they are automatically converted to the FASTQ format. Consequently, their filenames would change once they are imported. The new file names can be seen by expanding the sample management table, the new extension would be .fastq.gz.
If attributes are assigned from two different text files, the following will happen:
If the previous attributes have the same header and type (both are either categorical or numeric), the values are overwritten.
If there are different/additional headers on the "second round" of assignment, these new attributes will be appended to the table.
For numeric attributes, a "blank" value will not override a previous value.
The attributes assigned to the samples within the Data Tab will be associated with the samples throughout the project. During the course of analysis, Partek Flow tasks generate various tables and any attributes associated with a sample can be included in the table as optional columns. An example is shown in Figure 18 for a pre-alignment QA/QC report where the Optional columns link on the top left of the table reveal the different sample attributes.
In the Data tab, each sample can be renamed or deleted from the project by clicking the gear icon next to the sample name. The gear icon is readily visible upon mouse over (Figure 19). Sample can only be deleted if no analysis has been performed on the data yet. If any analysis has been performed on the data node, then delete sample operation is invisible. You can perform filter samples in downstream analysis if you want to exclude certain samples in further analysis. Deleting a sample from a project does not delete the associated files, which will remain on the disk.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
After samples have been added and associated with valid data files, in a Partek Flow project, a data node will appear in the Analyses tab (Figure 1). The Analyses tab is where different analysis tools and the corresponding reports are accessed.
The Analyses tab contains two elements: data nodes (circles) and task nodes (rounded rectangles) connected by lines and arrows. Collectively, they represent a data analysis pipeline.
Data nodes (Figure 2) may represent a file imported into the project, or a file generated by Partek Flow as an output of a task (e.g., alignment of FASTQ files generates BAM files).
Missing image Figure 2. The Analyses tab showing a data node of unaligned reads
Task nodes (Figure 3) represent the analysis steps performed on the data within a project. For details on the tasks available in Partek Flow, see the specific chapters of this user manual dedicated to the different tasks.
Clicking on a node reveals the context sensitive menu, on the right side of the screen.
Only the tasks that are available for the selected data node will be listed in the menu. For data nodes, actions that can be performed on that specific data type will appear.
In Figure 4, a node that contains Unaligned reads is selected (bold black line). The tasks listed are the ones that can be performed on unaligned data (QA/QC, Pre-analysis tools, and Aligners).
After a task is performed on a data node, a new task node is created and connected to the original data node. Depending on the task, a new data node may automatically be generated that contains the resulting data. For details of individual tasks, see Task Menu.
In Figure 5, alignment was performed on the unaligned reads. Two additional nodes were added: a task node for Align reads and an output data node containing the Aligned reads.
To run a task, select a data node and then locate the task you wish to perform from the context sensitive menu. Mouse over to see a description of the action to be performed. Click the specific task, set the additional parameters (Figure 6), and click Finish. The task will be scheduled, the display will refresh, and the screen will return to the project's Analyses tab.
In Figure 6, the STAR aligner was selected and the choices for the aligner index and additional alignment options appeared.
Tasks that are currently running (or scheduled in the queue) appear as translucent nodes. The progress of the task is indicated by the progress bar within the task node. Hovering the mouse pointer over the node will highlight the related nodes (with a thin black outline) and display the status of the task (Figure 7).
If a task is expected to generate data nodes, expected nodes appear even before the task is completed. They will have a lighter shade of color to indicate that they have not yet been generated as the task is still being performed. Once all tasks are done, all nodes would appear in the same shade.
Tasks can only be cancelled by the user that started the task or by the owner of the project. Running or pending tasks can be canceled by clicking the right mouse button on the task node and then selecting Cancel (Figure 8). Alternatively, the task node may be selected and the Cancel task selected from the context sensitive menu.
A verification dialog will appear (Figure 9) asking to confirm the task cancellation, the cancelled tasks will remain in the Analyses tab but will be flagged by gray x circles on the nodes (Figure 10).
Data nodes connected to incomplete tasks are also incomplete as no output can be generated (Figure 10).
To delete tasks from the project click the right mouse button on the task node and then select Delete (Figure 11). Alternatively, click the task node and select Delete task from the context sensitive menu.
Selecting a task node will reveal a menu pane with two sections: Task results and Task actions (Figure 13).
Items from the Task results section inform on the action performed in that node. Certain tasks generate a Task report (Figure 14), which include any tables or charts that the task may have produced.
The Task details shows detailed diagnostic information about the task (Figure 15). It includes the duration and parameters of the task, lists of input and output files, and the actual commands (in the command line) that were run.
Additionally, the Task details page would contain the error logs of unsuccessful runs. The user can download the logs or send them directly to Partek. This page plays an important role in diagnosing and troubleshooting issues related to task.
Double clicking on a task node will show the Task report page. However, if no report was generated, the user will be directed to the Task details page.
In the Task actions sections, the selected task can be Re-run w/new parameters, and in case it is part of a pipeline that includes additional tasks after it, running the Downstream tasks is an option. Re-running tasks will result in a new layer being made in the Analyses tab.
Another action available for a task node is Add task description (Figure 16), which is a way to add notes to the project. The user can enter a description, which will be displayed when the mouse pointer is hovered over the task node.
It is common for next-generation sequencing data analysis to examine different task parameters for optimization. Users may want to modify an upstream step (e.g. alignment parameters) and inspect its effect on downstream results (e.g. percent aligned reads).
The implementation of Layers makes optimizations easy and organized. Instead of creating separate nodes in a pipeline, another set of nodes with a different color is stacked on top of previous analyses (Figure 17). To see the parameters that were changed between runs, hover the mouse icon over the set of stacked task nodes and a pop-up balloon will display them. The text color signifies the layer corresponding to a specific parameter.
Layers are formed when the same task is performed on the same data node more than once. They are also formed when a task node is selected and the Re-run it w/new parameters is selected in the context sensitive menu. This will allow the users to change the options only for the selected task. The user may choose to re-run the task to which the changes have been made, as well as all the downstream tasks until the analysis is completed. To do so, select Re-run w/new parameters, downstream tasks from the context sensitive menu.
To select a different layer, use the left mouse button to click on any node of the desired layer. All the nodes associated with the selected layer have the same color and when clicked will be displayed on the top of the stack.
Addition of task and resulting data nodes to project may lead to creation of long pipelines, extending well beyond the border of the canvas (Figure 18).
In that case, the pipeline can be collapsed, to hide the steps that are (no longer) relevant. For example, once the single-cell RNA-seq data has been quantified, Single cell counts data node will be a new analysis start point, as the subsequent analyses will not focus on alignment, UMI deduplication etc. To start, right-click on the task node which should become a boundary of the collapsed portion of your pipeline and select Collapse tasks (Figure 19).
All the tasks on that layer will turn purple. Then left-click the task which should be the other boundary of the collapsed portion. All the tasks that will be collapsed will turn green and a dialog will appear (Figure 19). In the example shown in Figure 19, the tasks between Trim tags and Quantify barcodes will be collapsed. Give the collapsed section a name (up to 30 characters) and select Save (Figure 20).
The collapsed portion of the pipeline is replaced by single task node, with a custom label ("Single cell preprocessing"; Figure 21).
To re-expand the pipeline double click on the task node representing the collapsed portion of the pipeline. Alternatively, single click on the node and select Expand... on the context-sensitive menu. Within the same menu, you can also preview the contents of the collapsed task by selecting View... (Figure 22).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Log tab contains a table of the tasks that are running, scheduled, or those that have been completed within the Partek Flow (Figure 1). It provides an overview of the task progress, enables task management, and links to detailed reports for each task.
Each row of the table corresponds to a task node in the Analyses tab. The list can be sorted according to a specific column using the sort icon .
The Task column lists the name of the tasks. On the left of the task name is a colored circle indicating the layer of this task. The column is searchable by task name. Clicking the task name will open the Task report page. If the task did not generate a report, the link will go to the Task details page.
The User column identifies the task owner. Aside from the user who created the project, only collaborators and users with admin privileges can start tasks in a project. Clicking a name in the User column will display the corresponding User profile.
The End column shows when the task was completed. It will show the actual time for completed tasks, and the estimated time for running tasks. These estimates improve in accuracy as more tasks are completed in the current Partek Flow instance.
The Status column displays the current status of the task, such as Waiting, Running, Done, Canceled. If the task is currently running, a status progress bar will appear in the column. Once completed, the status of a task will be Done and the End column will be updated with the completion time.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Using a web browser, log in to Partek Flow. From the Home page click the New Project button; enter a project name (Figure 1) and then click Create project.
The Project name is the basis of the default name of the output directory for this project. Project names are unique, thus a new project cannot have the same name as an existing project within the same Partek Flow server.
Once a new project has been created, the user is automatically directed to the Analysis tab of the Project View (Figure 2).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
A project may be deleted from the Partek Flow server using the button on the upper right side of the Project View page (Figure 1).
Every project can be exported before it is removed from the server. By exporting old projects, you can free up some storage on your server. You can import the exported project back into Partek Flow later if needed.
When you export a project, you will be asked whether to include library files to export or not. If you choose Yes, the current version of library files used in the project will be archived and you can reproduce the result when you later import the project and re-do the analysis. However, it will make the archive size bigger. If you choose No, the library files will not be exported. Note that when you import the project later, you can only use the available version of needed library files to re-do the same analysis, and the results might not be the same.
The Import project option is under Projects drop-down menu on the top of the Partek Flow page (Figure 5). This can be accessed on any Partek Flow page.
The input of this option is the zipped file of the exported project. Browse to the file location which can either be the Partek Flow server, a local machine, or a URL. The zip file first needs to be uploaded to the Partek Flow server (if it is not on the server already), and then Partek Flow will unpack the zip file into a project. The project name will be the same as the exported project name. If the project with the same name already exists, the imported project will have a number appended to it (e.g., ProjectName_1).
The owner of the imported project will be the user that imported it.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
If a project is publicly available in the Gene Expression Omnibus (GEO) and European Nucleotide Archive (ENA) databases, you can import associated FASTQ files, sample attributes, and project details automatically into Partek Flow.
Click Projects at the top of the page
Click Import project
Choose GEO / ENA project for Select files from
Type the BioProject ID or the GEO Accession number
The format of a BioProject ID is PRJNA followed by one to six numbers (e.g., PRJNA291540). The format of a GEO Accession number is GSE followed by one to five numbers (e.g., GSE71578).
Click Import project at the bottom
The Analyses tab will include an Unaligned reads data node once the data download has started (Figure 3). It may take a while for the download to complete depending on the size of the data. FASTQ files are downloaded from the ENA BioProject page.
If the study is not publicly available in both GEO and ENA, project import will not succeed.
If there is an ENA project, but the FASTQ files are not available through ENA, the project will be created, but data will not be imported.
A variety of other issues and irregularities can cause imports to not succeed or partially succeed, including, but not limited to, a BioProject having multiple associated GSE IDs, incomplete information on the GEO or ENA page, and either the GEO or ENA project not being publicly available.
The Gene Expression Omnibus (GEO) and the European Nucleotide Archive (ENA) are web-accessible public repositories for genomic data and experiments. Access and learn more about their resources at their respective websites:
GEO - https://www.ncbi.nlm.nih.gov/geo/
ENA - https://www.ebi.ac.uk/ena
You can search ENA using the GEO ID (e.g., GSE71578) to check if there is a matching ENA project (Figure 6).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
With attributes added, we can begin building our pipeline.
Click the Analyses tab
In the Analysis tab, data are represented as circles, termed data nodes. One data node, mRNA, should be visible in the Analysis tab (Figure 1).
Click the mRNA node
Clicking a data node brings up the context-sensitive task menu with tasks that can be performed on the data node (Figure 2).
Pre-alignment QA/QC assesses the quality of the unaligned reads and will help us determine whether trimming or filtering is necessary.
Click Pre-alignment QA/QC in the QA/QC section of the task menu
Click Finish to run the task with default settings
Running a task creates a task node, e.g. the blue rectangle labeled Pre-alignment QA/QC (Figure 3), which contains details on the task and a report.. While tasks have been queued or are in progress they have a lighter color. Any output nodes that the task will generate are also displayed in a lighter color until the task completes. Once the task begins running, a progress bar is displayed on the task node.
Click the Pre-alignment QA/QC node
The context-sensitive task menu (Figure 4) shows the option to view the Task report and the Task details. You can also access a task report by double-clicking on a task node.
Click Task report
Pre-aligment QA/QC provides information about the sequencing quality of unaligned reads (Figure 5). Both project level summaries and sample-level summaries are provided.
Click sample SSR592573 in the data table of the report to open its sample-level report
The Average base quality score per position graph in the upper right-hand panel (Figure 6) gives the average Phred score for each position in the reads.
A Phred score is a measure of base call accuracy with a higher score indicating greater accuracy.
By convention, a score above 20 is considered adequate. As you can see, the standard error bars in the graph show that some reads have quality scores below 20 for some of their base pair calls near the 3' end.
Based on the results of Pre-alignment QA/QC, while most of the reads are high quality, we will need to perform read trimming and filtering. For more information about the information included in the task report, please see the Pre-alignment QA/QC user guide.
Click RNA-Seq 5-AZA to return to the Analyses tab
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
With our reads trimmed, we now have high-quality reads for each sample. The next step is to align the reads to a reference genome. Alignment matches each of the short sequencing reads to a location in the reference genome.
Click Trimmed reads
Click Aligners in the task menu to display available aligners (Figure 1)
Partek Flow offers a variety of different aligners. Mouse over any option for a short description. For this tutorial, we will use STAR, a fast and accurate aligner commonly used for RNA-Seq data. For more information about STAR and the other aligners, please consult the Aligners user guide.
Click STAR
The STAR aligner options allow us to select the genome build (assembly) and index. For this tutorial, our data set contains only reads that map to chromosome 22 to minimize the time required for resource-intensive tasks, such as alignment.
Click Finish to run with hg19 selected for Assembly and Whole genome for the Aligner index (Figure 2)
Alignment is a resource-intensive task and may take around 20 minutes to complete, even when mapping only reads from a single chromosome. Task and data nodes that have been queued, but not completed, are shown in a ligher color than completed tasks (Figure 3).
The Align reads task generates an Aligned reads data node once complete. You can wait for the alignment task to finish or you can continue building the pipeline while the results of alignment are pending; additional tasks can be added to the pipeline and queued before the current task has completed.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
After alignment has completed, we can view the quality of alignment by performing post-alignment QA/QC.
Click the Aligned reads data node
Click QA/QC inthe task menu
Click Post-alignment QA/QC from the QA/QC section of the task menu (Figure 1)
A Post-alignment QA/QC task node will be generated (Figure 2).
Double-click the Post-alignment QA/QC task node to view the task report
Similar to the Pre-alignment QA/QC task report, general quality information about the whole data set is displayed and sample-level reports can be opened by clicking a sample name in the table.
The top two graphs in the data set view (Figure 3) show the alignment breakdown and coverage.
From these graphs, we can see that more than 95% of reads were aligned, but the total number of reads for each sample varies. Normalizing for the variability in total read counts will be addressed in a later section of the tutorial.
For more information about the graphs and information presented in the Post-alignment QA/QC task report, see the Post-alignment QA/QC user guide.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Attributes describe samples. Examples of sample attributes include treatment group, age, sex, and time point. Attributes can be added individually in the Metadata tab or in bulk using a text file. In this tutorial, we will add one attribute, 5-AZA Dose, manually.
Click the Metadata tab
Click Manage under Sample attributes (Figure 1)
Click Add new attribute (Figure 2)
To configure a new attribute, at Name, type in 5-AZA Dose as the name of the attribute
Click Add to add 5-AZA Dose as a categorical, project-specific attribute (Figure 3)
Name the first New category 0uM
Click the green plus icon to add category (Figure 4)
Repeat for two additional categories, 5uM and 10uM (Figure 5)
Click Back to metadata tab
The data table now includes an Attribute column for 5-AZA Dose (Figure 6). Next, we need to assign samples attribute categories for 5-AZA Dose.
Select Assign values
The option to edit the 5-AZA Dose field for each sample will appear as a drop-down menu (Figure 7).
Select the 5-AZA Dose text box for a sample to bring up a drop-down menu with the 5-AZA Dose attribute categories (0uM, 5uM, 10uM)
Use the drop-down menus to add a treatment group for each sample
The first three samples (SRR592573-5) should be 0uM, the next three samples (SRR592576-8) should be 5uM, and the final three samples (SRR592579-81) should be 10uM (Figure 8).
Click Apply changes
The data table will now show each a 5-AZA Dose attribute for each sample.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Low expression genes may be indistinguishable from noise and will decrease the sensitivity of differential expression analysis.
Click the Gene counts node
Click Filtering in the task menu
Click Filter features (Figure 1)
Click Noise reduction filter
Set the filter to maximum <= 10
Click Finish (Figure 2)
A new Filtered counts node will be created (Figure 3).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Because different samples have different total numbers of reads, it would be misleading to calculate differential expression by comparing read count numbers for genes across samples without normalizing for the total number of reads.
Click the Filtered counts data node
Click Normalization and scaling in the task menu
Click Normalization (Figure 1)
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Count normalization menu will open (Figure 2).
Normalization can be performed by sample or by feature. By sample is selected by default; this is appropriate for the tutorial data set.
Available normalization methods are listed in the left-hand panel. For more information about these options, please see the Normalize counts user guide.
For this tutorial, we will use the recommended default normalization settings.
This adds the Median ratio normalization method, which is suitable for performing differential expression analysis using DESeq2 (Figure 3).
Click Finish to perform normalization
A Normalize counts task node and a Normalized counts data node are added to the pipeline (Figure 4)
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Project details section shows the Name of the project as well as (optional) a textual project Description and a Thumbnail (picture) (Figure 1).
The owner and collaborators (if any) can customize the Description and Thumbnail entries by pushing the orange Edit project details button (Figure 2). The fields can now be edited to:
Rename the project (names are limited to 30 characters). The original project Name is the one selected when creating the project
Add or change a project description (up to 2000 characters)
Add or change a thumbnail of the project (supported formats are .jpg, .bmp, .gif, .png; the maximum size of the image file is 10 MB)
The Description accepts hyperlinks starting with "http://" or "https://" and if selected, will open a new tab browser to navigate to the website. This description will be also displayed to collaborators and administrators on the Partek® Flow® Home Page. Choose File button launches a file browser showing directory structure of the local computer from which the thumbnail image file will be uploaded. Alternatively, Clear thumbnail button removes the current thumbnail.
Once all the edits have been made, push Save to accept (or Cancel to reject).
If a thumbnail has been added, it will appear on the Project details tab (Figure 3) and on the home page of Partek Flow, on the Details tab of the project.
The Members section provides an overview of users associated with a particular project and enables project creators (owners) and administrators to add collaborators (Figure 1). A user (without administrator status) has to be specified as a collaborator in a project to be able to access the project in his/her home folder and to perform tasks.
Pushing the pencil icon (Pencil icon](../../../.gitbook/assets/pencil-icon.png)) by a project member can result in two dialogs, depending on the status of the member. For a collaborator or a viewer, the pencil icon changes the member's role (e.g. from a Viewer to a Collaborator) (Figure 4).
Moreover, the project owner can transfer the ownership to another user account (one of the accounts already available at the current instance of Partek Flow) using the New owner dropdown list (Figure 5). The previous (old) owner can remain as a project collaborator, with the help of the matching option.
If email notifications are turned on for project ownership transfer, an email dialog box appears. This can be used to add additional text to the notification email body (Figure 6).
Partek Flow software manages separate experiments as projects. A complete project consists of input data, tasks used to analyze the data, the resulting output files, and a list of users involved in the analysis.
This chapter provides instructions in creating and analyzing a project and covers:
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Based on pre-alignment QA/QC, we need to trim low quality bases from the 3' end of reads.
Click the Unaligned reads data node
Click Pre-alignment tools in the task menu
Click Trim bases (Figure 1)
By default, Trim bases removes bases starting at the 3' end and continuing until it finds a base pair call with a Phred score of equal to or greater than 35 (Figure 2).
Click Finish to run Trim bases with default settings
The Trim bases task will generate a new data node, Trimmed reads (Figure 3). We can view the task report for Trim bases by double-clicking either the Trim bases task node or the Trimmed reads data node or choosing Task report from the task menu.
Double-click the Trimmed reads data node to open the task report
The report shows the percentage of trimmed reads and reads removed in a spreadsheet and a two graphs (Figure 4).
The results are fairly consistent across samples with ~2% of reads untrimmed, ~86% trimmed, and ~12% removed for each. The average quality score for each sample is increased with higher average quality scores at the 3' ends.
Click RNA-Seq 5-AZA to return to the Analyses tab
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The tutorial data set includes 9 samples equally divided into 3 treatment groups. Sequencing was performed by an Illumina HiSeq (paired-end reads), but the workflow can be easily adapted for data generated by other sequencers. Each sample has 2 fastq files for a total of 18 fastq files.
You can obtain the tutorial data set through Partek Flow.
Click your avatar
Click Settings in the drop-down menu (Figure 1)
At the top of the System information page, there is a section labeled Download tutorial data (Figure 2).
Click RNA-Seq 5-AZA to download the tutorial data set
A new project will be created and you will be directed to the Analyses Tab. The data will be downloaded automatically (Figure 3) and imported into your project. Because this is a tutorial project, there is no need to click on Add data as it will be done automatically.
At first the project is empty, but the file download will start automatically in the background. You can wait a few minutes then refresh your browser or you can monitor the download progress using the Queue.
Click Queue
Click View Queued Tasks in the drop-down menu
The Queued tasks page will open (Figure 4).
Click Projects
Click RNA-Seq 5-AZA in the drop-down menu
The Analyses tab will open (Figure 5). If you download has completed, you will see a blue circle titled mRNA.
Once the download completes, the sample table will appear in the Metadata tab.
Click the Metadata tab The Metadata tab includes the sample table with the names of each imported sample (Figure 6).
In the next section of the tutorial, we will add a sample attribute that indicates the treatment group of each sample.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The principal components analysis (PCA) scatter plot allows us to visualize similarities and differences between the samples in a data set.
Click the Normalized counts data node
Click Exploratory analysis in the task menu
Click PCA
Click Finish to run PCA with the default options
The PCA task node will be added to the pipeline (Figure 1)
Double click the PCA data node to open the PCA scatter plot (Figure 2)
In the Data Viewer, click Style under Configure and set the Color by drop-down to 5-AZA Dose. The scatter plot shows each sample as a sphere, colored by treatment group, in a three dimensional plot. The x, y, and z axes are the first three principal components. The percentage of total variance explained by each is listed next to the axis label. The size of each axis is determined by the variance along that axis. The plot is fully interactive; it can be rotated and points selected.
Here, we can see that samples separate based on treatment, but there is noticeable separation within treatment groups, particularly the 0μM and 10μM treatment groups.
For more detailed information about the PCA scatter plot, please see the PCA user guide.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
RNA-Seq uses the number of sequencing reads per gene or transcript to quantify gene expression. Once reads are aligned to a reference genome, we need to assign each read to a known transcript or gene to give a read-count per transcript or gene.
Click the Aligned reads data node
Click Quantification in the task menu
We will use Partek E/M to quantify reads to an annotation model in this tutorial. For more information about the other quantification options, please see the Quantification user guide.
Click Quantify to an annotation model (Partek E/M) (Figure 1)
Choose the latest RefSeq Transcripts 95 annotation from the Gene/feature annotation drop-down menu (you may need to download it first, via Library File Management)
Click Finish (Figure 2)
The Quantify to annotation model task node outputs two data nodes, Gene counts and Transcript counts (Figure 3).
To view the results of quantification, we can select either data node output.
Double-click the Gene counts data node to view the task report
The task report details the number of reads within exons, introns, and intergenic regions. For detailed information about the quantification results, see the Quantify to annotation model (Partek E/M) user guide.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
After normalizing the data, we can perform differential analysis to identify genes that are differentially expressed based on treatment.
Click the Normalized counts node
Click Statistics in the task menu
Click Differential analysis in the task menu (Figure 1)
Check 5-AZA Dose and click Add factors to add the attribute to the statistical model.
Select Next to continue with 5-AZA Dose as the selected attribute
The Comparisons page will open (Figure 4).
It is easiest to think about comparisons as the questions we are asking. In this case, we want to know what are the differentially expressed genes between untreated and treated cells. We can ask this for each dose individually and for both collectively.
The upper box will be the numerator and the lower box will be the denominator in the comparison calculation so we will select the 0μM control in the lower box.
Drag 5μM to the upper box
Drag 0μM to the lower box
Click Add comparison to add 5μM vs. 0μM to the comparison table (Figure 5)
Repeat to create comparisons for 10μM vs. 0μM and 10μM,5μM vs. 0μM (Figure 6)
Click Finish to perform DESeq2 as configured
A DESeq2 task node and a DESeq2 data node will be added to the pipeline (Figure 7).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Once we have performed DESeq2 to identify differentially expressed genes, we can create a list of significantly differentially expressed genes using cutoff thresholds.
Double click the Feature list data node to open the task report
The task report spreadsheet will open showing genes on rows and the results of the DESeq2 on columns (Figure 1).
To get a sense of what filtering thresholds to set, we can view a volcano plot for a comparison.
A volcano plot will open showing p-value on the y-axis and fold-change on the x-axis (Figure 2). If the gene labels are on (not shown), click on the plot to turn them off.
Thresholds for the cutoff lines are set using the Statistics card (Configuration panel > Configure > Statistics). The default thresholds are |2| for the X axis and 0.05 for the Y axis.
Switch to the browser tab showing the DESeq2 report
Click FDR step up
Click the triangle next to FDR step up to open the FDR step up options
Leave All contrasts selected
Set the cutoff value to 0.05. Hit Enter.
This will include genes that have a FDR step up value of less than or equal to 0.05 for all three contrasts, 5μM vs. 0μM, 10μM vs. 0μM and 5μM:10μM vs. 0μM. FDR step up is the false discovery rate adjusted p-value used by convention in microarray and next generation sequencing data sets in place of unadjusted p-value.
Click Fold-change
Click the triangle next to Fold-change to open the Fold-change options
Leave All contrasts selected
Set to From -2 to 2 with Exclude range selected. Hit Enter.
Note that the number of genes that pass the filter is listed at the top of the filter menu next to Results: and will update to reflect any changes to the filter. Here, 27 genes pass the filter (Figure 3). Depending on your settings, the number may be slightly different.
This creates a Filter list task node and a Filtered feature list data node (Figure 4).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
To check how well our list of differentially expressed genes distinguishes one treatment group from another, we can perform hierarchical clustering based on the gene list. Clustering can also be used to discover novel groups within your data set, identify gene expression signatures distinguishing groups of samples, and to identify genes with similar patterns of gene expression across samples.
Click the Feature list data node
Click Exploratory analysis in the task menu
Click Hierarchical clustering (Figure 1)
The Hierarchical clustering menu will open (Figure 2). Hierarchical clustering can be performed with a heatmap or bubble map plot. Cluster must be selected under Ordering for both Feature order and Cell order if both the features (columns) and cells (rows) are to be clustered.
Click Finish to run with default settings
A Hierarchical clustering task node will be added to the pipeline (Figure 3).
Double-click the Hierarchical clustering / heatmap task node to launch the heatmap
The Dendrogram view will open showing a heatmap with the hierarchical clustering results (Figure 4).
Samples are shown on rows and genes on columns. Clustering for samples and genes is shown through the dendrogram trees. More similar samples/genes are separated by fewer branch points of the dendrogram tree.
The heatmap displays standardized expression values with a mean of zero and standard deviation of one.
The heatmap can be customized to improve data visualization using the menu on the Configuration panel on the left.
Expand the Annotations > Data card.
In the dialog, click on the Gene counts node
Now set the Row annot to 5-AZA Dose
Samples are now labeled with their 5-AZA Dose group (Figure 5).
Samples cluster based on treatment group and the 5μM and 10μM groups are more similar to each other than to the 0μM group.
We can save the heatmap as a publication-quality image.
Choose size and resolution using the Save as SVG dialog (Figure 6)
Select Save
The heatmap will be saved as a .svg file and downloaded in your web browser.
For more information about hierarchical clustering and the Dendrogram view, please see the Hierarchical Clustering user guide.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
To learn more about the biology underlying gene expression changes, we can use gene ontology (GO) or pathway enrichment analysis. Enrichment analysis identifies over-represented GO terms or pathways in a filtered list of genes.
Click the filtered Filtered feature list data node
Click Biological interpretation in the task menu
Click Gene set enrichment then select Gene set database to perform GO enrichment analysis (Figure 1)
Select the latest gene set from geneontology.org from the Gene set database drop-down menu
Click Finish
A GO e_nrichment_ task node will be added to the pipeline (Figure 2).
Double-click the Gene set enrichment task node to open the task report (Figure 3)
The GO e_nrichment_ task report spreadsheet lists GO terms by ascending p-value with the most significant GO term at the top of the list. Also included are the enrichment score, the number of genes from that GO term in the list, and the number of genes from that GO term that are not in the list.
For more information about GO enrichment analysis, please see the Gene Set Enrichment user guide.
KEGG enrichment analysis identifies pathways that are over-represented in a gene list data node.
Click the filtered Filtered feature list data node
Click Biological interpretation in the task menu
Click Gene set enrichment then select KEGG database
Click Finish in the configuration dialog to run KEGG analysis with the Homo sapiens KEGG database
A Pathway e_nrichment_ task node will be added to the pipeline (Figure 4).
Double-click the Pathway enrichment task node to open the task report
The Pathway enrichment task report is similar to the Enrichment analysis task report (Figure 5).
To view an interactive KEGG pathway map, click the pathway ID (Gene set column).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
By following the steps in this tutorial, you have built a pipeline. You can save this pipeline for future use.
Select Create new pipeline near the bottom left-hand side of the browser window
Select the Pre-alignment QA/QC, Tr****im bases, Align reads, Post-alignment QA/QC, Quantify to annotation model, Filter features, Normalize counts, PCA, and GSA task nodes to include them in the pipeline
Name the pipeline; we have chosen RNA-Seq basic analysis
Give a description for the pipeline; we have noted trim <20, STAR, normalize with total count and add 0.001, GSA
Select Create pipeline (Figure 1)
To access this pipeline in the future, select an unaligned reads data node and choose Pipelines from the task menu. Available saved pipelines will be available to choose from the Pipelines section of the task menu (Figure 2).
In addition to the volcano plot showing all genes, we can view expression levels of each gene on a dot plot.
Double-click the **Filtered feature list **data node to open the task report
Click the FDR step up header in the 5uM vs. 0uM section to sort by ascending FDR step up
In the task report table, there is a column labeled View with three icons in each row.
Select to open a dot plot for the gene SELENOM
The dot plot for SELENOM (Figure 1) shows each sample as a point with normalized reads on the y-axis. Samples are separated and colored by treatment group.
The Attachments tab allows the project owner to add external (i.e. non-Partek Flow) files to a project (for instance, spreadsheets, word documents, manuscripts). To attach a file, go to the Attachments tab (Figure 1). Choose File button invokes the file browser showing the directory structure of the local computer. Select the file that you want to attach and then click on the Upload attachment button. For security reasons, Partek Flow will not allow you to add an executable file.
All added files will be listed in the table under the Attachments tab (Figure 2). The tab will also display file sizes, the user name of the person who uploaded the file and the time it was uploaded. Note that uploaded files will count towards the total size of the project, and thus, if available, to the disk quota of the project owner.
To remove a file, click the icon. To download the attachment, click the icon.
This tutorial presents an outline of the basic series of steps for analyzing a 10x Genomics Gene Expression with Feature Barcoding (antibody) data set in Partek Flow starting with the output of Cell Ranger.
If you have Cell Hashing data, please see our documentation on .
This tutorial includes only one sample, but the same steps will be followed when analyzing multiple samples. For notes on a few aspects specific to a multi-sample analysis, please see our tutorial.
If you are new to Partek Flow, please see for information about data transfer and import and for information about the Partek Flow user interface.
The data set for this tutorial is a demonstration data set from 10x Genomics. The sample includes cells from a dissociated Extranodal Marginal Zone B-Cell Tumor (MALT: Mucosa-Associated Lymphoid Tissue) stained with BioLegend TotalSeq-B antibodies. We are starting with the produced by Cell Ranger. Prior to beginning, transfer this file to your Partek Flow using the Transfer files button on the homepage.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
From the DESeq2 task report, we can browse to any gene in the Chromosome view.
Click in the SELENOM row to open Chromosome view (Figure 1)
A new tab will open showing SELENOM in the Chromosome view (Figure 2).
Chromosome View shows reference genome, annotation, and data set information together aligned at genomic coordinates.
The top track shows average number of total count normalized reads for each of the three treatment groups in a stacked histogram. The second track shows the RefSeq annotation.
We can add tracks from any data node using Select Tracks.
Click Select tracks
A pop-up dialog showing the pipeline allows us to choose which data to display as tracks in Chromosome view (Figure 3).
Click Reads pileup under Aligned reads on the left-hand side of the dialog
Click Display tracks to make the change
The reads pileup track is now included (Figure 4).
The Single cell counts data node contains two different types of data, mRNA expression and protein expression. So that we can process these two different types of data separately, we will split the data by data type.
Click the Single cell counts data node
Click Pre-analysis tools in the toolbox
Click Split by feature type
A rectangular task node will be created along with two circular data nodes, one for each data type (Figure 1). The labels for these data types are determined by features.csv file used when processing the data with Cell Ranger. Here, our data is labeled Gene Expression, for the mRNA data, and Antibody Capture, for the protein data.
An important step in analyzing single cell RNA-Seq data is to filter out low-quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few counts to be analyzed. In a CITE-Seq experiment, protein aggregation in the antibody staining reagents can cause a cell to have a very high number of counts. These are low-quality cells that can be excluded. Additionally, if all cells in a data set are expected to show a baseline level of expression for one of the antibodies used, it may be appropriate to filter out cells with very low counts or a low number of detected features. You can do this in Partek Flow using the Single cell QA/QC task.
We will start with the protein data.
Click the Antibody Capture data node
Click QA/QC in the toolbox
Click Single Cell QA/QC
This produces a Single-cell QA/QC task node (Figure 2).
Double-click the Single cell QA/QC task node to open the task report
The Single cell QA/QC report opens in a new data viewer session. There are interactive violin plots showing the most commonly used quality metrics for each cell: the total count per cell and the number of detected features per cell (Figure 3). Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric.
For this analysis, we will set a maximum counts threshold to exclude potential protein aggregates and, because we expect every cell to be bound by several antibodies, we will also set a minimum counts threshold.
Select one of the plots on the canvas
In the Select & Filter icon on the left under Tools, set the Counts threshold to keep cells between 500 and 20000 (Figure 4)
Click Apply observation filter...
Select the Antibody Capture data node as input in the pipeline preview (Figure 5)
Click Select
You will see a message telling you a new task has been enqueued.
Click OK to dismiss the message
Click the project name at the top to go back to the Analyses tab
Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab
A new task, Filter counts, is added to the Analyses tab. This task produces a new Filter counts data node.
Next, we can repeat this process for the Gene Expression data node.
Click the Gene Expression data node
Click the QA/QC section in the toolbox
Click Single Cell QA/QC
This produces a Single-cell QA/QC task node
Double-click the Single cell QA/QC task node to open the task report
The task report lists the number of counts per cell, the number of detected features per cell, the percentage of mitochondrial reads per cell, and the percentage of ribosomal counts per cell in four violin plots (Figure 6). For this analysis, we will set maximum and minimum thresholds for total counts and detected genes to exclude potential doublets and a maximum mitochondrial reads percentage filter to exclude potential dead or dying cells. There is no need to apply a filter based on the percentage of ribosomal counts in this tutorial.
In the Selection card on the right, set the Counts threshold to keep cells between 1500 and 15000
Set the Detected features to keep cells between 400 and 4000
Set the % Mitochondrial counts to keep cells between 0% and 20% (Figure 6)
Click Apply observations filter
Select the Gene Expression data node as input in the pipeline preview
Click Select
Click OK to dismiss the message about the task being enqueued
Click the project name at the top to go back to the Analyses tab
Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab
A new task, Filter counts, is added to the Analyses tab. This task produces a new Filter counts data node (Figure 7)
After excluding low-quality cells, we can normalize the data.
We will start with the protein data.
Click the Filtered counts data node produced by filtering the Antibody Capture data node
Click Normalization and scaling in the toolbox
Click Normalization
Click Finish to run (Figure 8)
The recommended normalization for protein data includes the following steps: Add 1, Divide by Geometric mean, Add 1, Log base 2. This is a variant of Centered log-ratio (CLR), which was used to normalize antibody capture protein counts data in the paper that introduced CITE-Seq [1] and in subsequent publications on similar assays [2. 3]. CLR normalization includes the following steps: Add 1, Divide by Geometric mean, Add 1, log base e. Normalizing the protein data to base 2 instead of e allows for better integration with gene expression data further downstream. If you would prefer to use CLR, click and drag CLR from the panel on the left to the right. If you do choose to use CLR, we recommend making sure the gene expression data is normalized to the base e, to allow for smoother integration further downstream.
Normalization produces a Normalized counts data node on the Antibody Capture branch of the pipeline.
Next, we can normalize the mRNA data. We will use the recommended normalization method in Partek Flow, which accounts for differences in library size, or the total number of UMI counts, per cell, adds 1 and log2 transforms the data.
Click the Filtered counts data node produced by filtering the Gene Expression data node
Click the Normalization and scaling section in the toolbox
Click Normalization
Click Finish to run (Figure 9)
Normalization produces a Normalized counts data node on the Gene Expression branch of the pipeline (Figure 10).
For quality filtering and normalization, we needed to have the two data types separate as the processing steps were distinct. For downstream analysis, we want to be able to analyze protein and mRNA data together. To bring the two data types back together, we will merge the two normalized counts data nodes.
Click the Normalized counts data node on the Antibody Capture branch of the pipeline
Click Pre-analysis tools in the toolbox
Click Merge matrices
Click Select data node to launch the data node selector
Data nodes that can be merged with the Antibody Capture branch Normalized counts data node are shown in color (Figure 11).
Click the Normalized counts data node on the Gene Expression branch of the pipeline (Figure 11)
Click Select
Click Finish to run the task
The output is a Merged counts data node (Figure 12). This data node will include the normalized counts of our protein and mRNA data. The intersection of cells from the two input data nodes is retained so only cells that passed the quality filter for both protein and mRNA data will be included in the Merged counts data node.
To simplify the appearance of the pipeline, we can group task nodes into a single collapsed task. Here, we will collapse the filtering and normalization steps.
Right-click the Split by feature type task node
Choose Collapse tasks from the pop-up dialog (Figure 13)
Tasks that can be selected for the beginning and end of the collapsed section of the pipeline are highlighted in purple (Figure 14). We have chosen the Split matrix task as the start and we can choose Merge matrices as the end of the collapsed section.
Click the Merge matrices task to choose it as the end of the collapsed section
Name the Collapsed task Data processing
Click Save (Figure 15)
The new collapsed task, Data processing, appears as a single rectangular task node (Figure 16).
To view the tasks in Data processing, we can expand the collapsed task.
Double-click Data processing to expand it or right-click and choose Expand collapsed task
When expanded, the collapsed task is shown as a shaded section of the pipeline with a title bar (Figure 17).
Double-click the Data processing title bar to re-collapse
[1] Stoeckius, M., Hafemeister, C., Stephenson, W., Houck-Loomis, B., Chattopadhyay, P. K., Swerdlow, H., ... & Smibert, P. (2017). Simultaneous epitope and transcriptome measurement in single cells. Nature methods, 14(9), 865.
[2] Stoeckius, M., Zheng, S., Houck-Loomis, B., Hao, S., Yeung, B. Z., Mauck, W. M., ... & Satija, R. (2018). Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome biology, 19(1), 224.
[3] Mimitou, E., Cheng, A., Montalbano, A., Hao, S., Stoeckius, M., Legut, M., ... & Satija, R. (2018). Expanding the CITE-seq tool-kit: Detection of proteins, transcriptomes, clonotypes and CRISPR perturbations with multiplexing, in a single assay. bioRxiv, 466466.
This tutorial presents an outline of the basic series of steps for analyzing a single cell RNA-Seq experiment in Partek Flow starting with the count matrix file.
This tutorial includes only one sample, but the same steps will be followed when analyzing multiple samples. For notes on a few aspects specific to a multi-sample analysis, please see our tutorial.
An important step in analyzing single cell RNA-Seq data is to filter out low quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed. You can do this in Partek Flow using the Single cell QA/QC task.
Click on the Single cell data node
Click on the QA/QC section of the task menu
Click on Single cell QA/QC
A task node, Single cell QA/QC, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Single cell QA/QC task node to indicate that the task is running (Figure 1).
Click the Single cell QA/QC node once it finishes running
Double-click the Task report in the task menu
The Single cell QA/QC report includes interactive violin plots showing the value of every cell in the project on several quality measures (Figure 2).
There can be four plots: number of read counts per cell, number of detected genes per cell, the percentage of mitochondrial reads per cell, and the percentage of ribosomal counts.
The plots will be shaded to reflect the selection. Cells that are excluded will be shown as dim dots on all plots.
The read counts per cell and number of detected genes per cell are typically used to filter out potential doublets - if a cell as an unusually high number of total counts or detected genes, it may be a doublet. The mitochondrial reads percentage can be used to identify cells damaged during cell isolation - if a cell has a high percentage of mitochondrial counts, it is likely damaged or dying and may need to be excluded.
A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes (features). Because there is no gold standard for what makes a gene informative or not and ideal gene filtering criteria depends on your experimental design and research question, Partek Flow has a wide variety of flexible filtering options. The Filter features step can also be performed before normalization or after normalization.
Click the data node containing count matrix
Click Filtering in the task menu
Click Filter features
There are four categories of filter available - noise reduction, statistics based, feature metadata, and feature list.
The noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The statistics based filter is useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The feature list filter allows you to filter your data set to include or exclude particular genes.
For example, you can use a noise reduction filter to exclude genes that are not expressed by any cell in the data set, but were included in the matrix file.
Click the Noise reduction filter check box
Set the Noise reduction filter to Exclude features where value <= 0 in at least 99.9% of cells using the drop-down menus and text boxes
Click Finish to apply the filter (Figure 3)
This results node, Filtered counts, will be the starting point for the next stage of analysis.
Because different cells will have a different number of total counts, it is important to normalize the data prior to downstream analysis. For droplet-based single cell isolation and library preparation methods that use a 3' counting strategy, where only the 3' end of each transcript is captured and sequenced, we recommend the following normalization - 1. CPM (counts per million), 2. Add 1, 3. Log2. This accounts for differences in total UMI counts per cell and log transforms the data, which makes the data easier to visualize.
Click the Filtered cells results node produced by the Filtered counts task
Click Normalization and scaling in the context-sensitive task menu on the right
Click Normalization
This adds CPM (counts per million), Add 1, and Log2 to the Normalization order panel. Normalization steps are performed in descending order.
Click Finish to apply the normalization (Figure 4 )
A new Normalized counts data node will be produced. You can choose to change the color of this node by right-clicking on the task node then clicking Change color and/or rename the result node by right-clicking and selecting Rename data node.
In the example below, I have changed the color to dark blue and renamed the results node based on the scheme.
For more information on normalizing data in Partek Flow, please see the Normalize Counts section of the user manual.
Principal components (PC) analysis (PCA) is an exploratory technique that is used to describe the structure of high dimensional data by reducing its dimensionality. Because PCA is used to reduce the dimensionality of the data prior to clustering as part of a standard single cell analysis workflow, it is useful to examine the results of PCA for your data set prior to clustering.
Click the Filtered counts node
Click Exploratory analysis in the task menu
Click PCA from the drop-down list
You can choose Features contribute equally to standardize the genes prior to PCA or allow more variable genes to have a larger effect on the PCA by choosing by variance. By default, we take variance into account and focus on the most variable genes.
If you have multiple samples, you can choose to run PCA for each sample individually or for all samples together by selecting or not selecting the Split by sample option (Figure 5).
Click Finish to run
A new PCA task node will be produced.
Double-click the PCA task node to open the 3D PCA scatter plot in data viewer (Figure 6)
Beside PCA coordinates of the cells, PCA task report also includes, the Scree plot, the component loadings table, and the PC projections table.
The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering, UMAP and t-SNE.
Note that Partek Flow suggests appropriate data for each plot type that is chosen so only PCA results will be available to select from for the Scree plot.
Mouse over the Scree plot to identify the point where additional PCs offer little additional information
In this data set, a reasonable cut-off could be set anywhere between 7 and 20 PCs.
Viewing the genes correlated with each PC can be useful when choosing how many PCs to include.
To display PCA projects table, click on the Table drop-down list in the Content icon under Configure and choose PCA projections (Figure 9)
PCA projections table contains each row as an observation (a cell in this case), each column represents one principal component (Figure 10). This table can be downloaded as text file, the same way as the component loading table.
Graph-based clustering identifies groups of similar cells using PC values as the input. By including only the most informative PCs, noise in the data set is excluded, improving the results of clustering.
Click the PCA data node
Click Exploratory analysis in the task menu
Click Graph-based clustering
Clustering can be performed on each sample individually or on all samples together. Here, we are working with a single sample.
Check Compute biomarkers to compute features that are highly expressed when comparing each cluster (Figure 11)
Click Configure to access the Advanced options and change the Number of nearest neighbors to 50 and Nearest Neighbor Type to K-NN for this example tutorial.
The Number of principal components should be set based on the your examination of the Scree plot and component loadings table. The default value of 100 is likely exhaustive for most data sets, but may introduce noise that reduces the number of clusters that can be distinguished.
Click Finish to run the task
A new Graph-based clusters data and Biomarkers data node will be generated along with the task nodes.
Double-click the Graph-based clusters node to see the cluster results and statistics (left screenshot on Figure 12)
Double-click the Biomarkers node to see the computed biomarkers if you have selected this option (right screenshot on Figure 12)
The Graph-based clustering result lists the Total number of clusters and what proportion of cells fall into each cluster as well as Maximum modularity which is a measurement of the quality of the clustering result where optimal modularity is 1. The Biomarkers node includes the top features for each graph-based cluster. It displays the top-10 genes that distinguish each cluster from the others. Download at the bottom right of the table can be used to view and save more features. These are calculated using an ANOVA test comparing the cells in each group to all the other cells, filtering to genes that are 1.5 fold upregulated, and sorting by ascending p-value. This ensures that the top-10 genes of each cluster are highly and disproportionately expressed in that cluster.
We will use t-SNE to visualize the results of Graph-based clustering.
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensional reduction technique that prioritizes local relationships to build a low-dimensional representation of the high-dimensional data that places objects that are similar in high-dimensional space close together in the low-dimensional representation. This makes t-SNE well suited for analyzing high-dimensional data when the goal is to identify groups of similar objects, such as cell types in single cell RNA-Seq data.
Click the Graph-based clusters node
Click Exploratory analysis in the task menu
Click t-SNE
If you have multiple samples, you can choose to run t-SNE for each sample individually or for all samples together using the Split cells by sample option. Please note that this option will not be present if you are running t-SNE on a clustering result. For clarity, clustering results run with all samples together must be viewed together and clustering results run by sample must be viewed by sample.
Like Graph-based clustering, t-SNE takes PC values as its input and further reduces the data down to two or three dimensions. For consistency, you should use the same number of PCs as the input for t-SNE that you used for Graph-based clustering.
Click Apply
Click Finish to run (Figure 13)
A new t-SNE task node will be produced.
Double-click the t-SNE node to open the t-SNE task report (Figure 14). Use the panel on the left to modify the plot or add more plots to this Data viewer session.
The t-SNE scatter plot is interactive and can be viewed for 2D or 3D. The t-SNE plot is 3D by default. You can rotate the 3D plot by left-clicking and dragging your mouse or using Control under Configure. You can zoom in and out using your mouse wheel. You can pan by right-clicking and dragging your mouse. You can use Style to modify color, shape, size, and labeling (e.g. add a fog effect to improve depth perception on the plot). Add a 2D plot clicking New plot, selecting 2D Scatter plot and selecting t-SNE as the source of the data.
Click on the plot to ensure that the plot window is selected. Click Style under Configure to color the t-SNE.
Color by the options in the drop-down menu under Color. You should be on the normalized counts node which can be seen by hovering over or clicking the circle (node) to the right of the drop-down.
Click the text field in the drop-down and start typing CD79A then select the gene by clicking on it (Figure 15)
The cells on the plot will be colored based on their expression level of CD79A (Figure 16). In the example in Figure 16, the Style icon has been dragged to a different location on the screen and the legend has also been resized and moved. Resizing the legend can either be done on the legend itself or using the Description icon under Configure.
Clicking a cell on the plot shows the expression values of the cell in the legend. Hovering over a cell on the plot also shows this information and related details (Figure 18).
If you want to color by more than three genes at time, such as by a list of genes that distinguish a particular cell type, you can use the color by Feature list option.
Select Feature List from the Color by drop-down
Choose Cytotoxic cells from the List drop-down (use List management in Settings to add lists to Partek Flow which will automatically make them available here)
Choose PCA from the Metric drop-down
Coloring by a list, in this way, calculates the first three principal components for the gene list and colors the cells on the plot by their values along those three PCs with green for PC1, red for PC2, and blue for PC3 (Figure 19).
Typically, the expression of a set of marker genes will be highly correlated, allowing the first PC to account for a large percentage of the variance between cells for that gene list. As a result, the group of cells characterized by their expression of the genes on the list will separate from the rest of the cells along PC1 and will be colored green (Figure 16). If the gene list is more complex, for example, including marker genes for multiple cell types, there may be several sets of correlated genes accounting for significant amounts of variance, leading to groups of cells being distinguishable along PC2 and PC3 as well. In that case, there may be green, blue, and red groups of cells on the plot. If the gene list does not distinguish any group of cells, all cells will have similar PC values, leading to similarly colored cells on the plot.
In addition to coloring by gene expression and by gene lists, the points can be colored by any cell or sample attribute. Available attributes are listed as options in the Color by drop-down menu. Note that any available options are dependent upon the selected data node. In the following section we will use the attribute Graph-based to color our cells by the clusters identified in the Graph-based clustering task (Figure 20).
Left-click and hold to draw a lasso around a cluster of cells
Release and click the starting circle to close the lasso and select the enclosed cells (Figure 21)
You can also create a lasso with straight lines using Lasso mode by clicking, releasing, and clicking again to draw a shape.
By default, selected cells are shown in bold while unselected cells are dimmed (Figure 22). This can be changed to gray selected cells using the Select & Filter tool in the left panel as shown in Figure 22.
Double-click any blank section of the scatter plot to clear the selection
Alternatively, you can select cells using any criteria available for the data node that is selected in the Select & Filter tool. To change the data selection click the circle (node) and select the data.
Choose Graph-based from the Criteria drop-down menu in the Select & Filter tool after ensuring you on are on the Graph-based cluster node by hovering on the circle (Figure 23). If you are not on the correct node, you need to click the circle and select the data.
This adds check boxes for each level of the attribute (i.e., clusters). Click a check box to select the cells with that attribute level.
Click only 2 and 3
This selects cells from Graph-based clusters 2 and 3 (Figure 24). The number of selected cells is listed in the Legend on the plot.
Cells can also be selected based on their gene expression values in the Select & Filter section.
Click the circle and select the Normalized counts node which has gene expression data
Type cd3d in the text field of the drop-down
Click on CD3D to add it as criteria to select from and use the slider or text field to adjust the selected values. Pin the histogram to visualize the distribution during selection.
Very specific selections can be configured by adding criteria in this way. In the example below, Clusters 2 and 3 and high CD3D expression is selected (Figure 25).
Once a cell has been selected on the plot, it can be filtered. The filter controls can exclude or include (only) any selected cell. Filtering can be particularly useful when you want to use a gene expression threshold to classify a group of cells, but the gene in question is not exclusively expressed by your cell type of interest.
In this example we can filter to include just cells from the selection we have already made.
The plot will update to show only the included cells as seen in Figure 26.
Cells that are not shown on the plot cannot be selected, allowing you to focus on the visible cells. The number of cells shown on the plot out of the total number of original cells is shown in the Legend. You can adjust the view to focus on only the included cells.
Additional inclusion or exclusion filters can be added to focus on a smaller subset of cells.
Click Clear filters to remove applied filters
The plot will update to show all cells and return to the original scaling.
Classifying cells allows to you assign cells to groups that can be used in downstream analysis and visualizations. Commonly, this is used to describe cell types, such as B cells and T cells, but can be used to describe any group of cells that you want to consider together in your analysis, such as cycling cells or CD14 high expressing cells. Each cell can only belong to one class at a time so you cannot create overlapping classes.
To classify a cell, just select it then click Classify selection in the Classify tool.
For example, we can classify a cluster of cells expressing high levels of CD79A as B cells.
Set Color by in the Style configuration to the normalized counts node
Type CD79A in the search box and select it. Rotate the 3D plot if you need to see this cluster more clearly.
Draw a lasso around the cluster of CD79A-expressing cells (Figure 28)
Because most of these cells express CD79A, a B cell marker, and because they cluster together on the t-SNE, suggesting they have similar overall gene expression, we believe that all these cells are B cells.
Click Classify under Tools in the left panel
Type B cells for the Name
Click Save (Figure 29)
Color by New classifications under Style (Figure 30) while you are still working on the classifications
To use the classifications in downstream tasks and visualizations, you must first apply them.
Click Apply classifications
Name the classification (e.g. Classified Cell Types)
Click Run to confirm
Once you have added a classification to the project, you can color the t-SNE plot by the Classification.
Here, I classified a few additional cell types using a combination of known marker genes and the clustering results then applied the classification (Figure 31).
Summarize Classifications with the number and percentage of cells from each sample that belong to each classification using an Attribute table under New plot. This is particularly useful when you are classifying cells from multiple samples.
Click New plot
Select Attribute table and the source of data (Figure 32) which in this case is called Classify result
Click on the Normalized counts" node
Navigate to the Compute biomarkers task under Statistics in the task menu
Follow the task dialogue and click Finish (Figure 33)
Double click the Biomarkers node to view the Biomarkers results
A common goal in single cell analysis is to identify genes that distinguish a cell type. To do this, you can use the differential analysis tools in Partek Flow. I will show how to use the ANOVA test in Partek Flow, a statistical test shown to be highly effective for differential analysis of single cell RNA-Seq data.
Click the Normalized counts results node
Click Statistics in the toolbox
Click Differential Analysis
Select ANOVA as the M_ethod to use for differential analysis_
The first page of the configuration dialog asks what attributes you want to include in the statistical test. Here, we only want to consider the Classifications, but in a more complex experiment, you could also include experimental conditions or other sample attributes.
Click Classified Cell Types
Click Next (Figure 34)
We will make a comparison between NK cells and all the other cell types to identify genes that distinguish NK cells. You can also use this tool to identify genes that differ between two cell types or genes that differ in the same cell type between experimental conditions.
Drag NK cells to the top panel
The top panel is the numerator for fold-change calculations so the experimental or test groups should be selected in the top panel.
Click all the other classifications in the bottom panel
The bottom panel is the denominator for fold-change calculations so the control group should be selected in the bottom panel.
Click Add comparison
This adds the comparison to the statistical test.
Click Finish to run the ANOVA task (Figure 35)
Double-click the newly generated data node to open the ANOVA task report
The Feature plot viewer will open showing a dot plot for CCL4 which can be modified to summarize the data in different ways (Figure 37). In the image below, the red boxes highlight the changes that were made to configure the plot. This includes overlaying the violins (density plots with the width corresponding to frequency) on the dot plot represented by the Classified Cell Types.
You can switch the grouping of cells. To do this, show the X axis labels then click and drag the labels to reposition the cell types on the plot.
Click ANOVA report to return to the table
The table lists all of genes in the data set; using the filter control panel on the left, we can filter to just the genes that are significantly different for the comparison.
Click FDR step up and click the arrow next to it
Set to 1e-8
Here, we are using a very stringent cutoff to focus only on genes that are specific to NK cells, but other applications may require a less stringent cutoff.
Click Fold change and click the arrow next to it
Set to -2 to 2
The number of genes at the top of the filter control panel updates to indicate how many genes are left after the filters are applied.
The ANOVA report will close and a new task, the Differential analysis filter, will run and generate a filtered Feature list data node.
Once we have filtered to a list of significantly different genes, we can visualize these genes by generating a heatmap.
Click the Filtered feature list data node produced by the Differential analysis filter
Click Exploratory analysis in the toolbox
Click Hierarchical clustering / heatmap
The hierarchical clustering task will generate the heatmap; choose Heatmap as the plot type. You can choose to Cluster features (genes) and cells (samples) under Feature order and Cell order in the Ordering section. You will almost always want to cluster features as this generates the clear blocks of color that make heatmaps comprehensible. For single cell data sets, you may choose to forgo clustering the cells in favor of ordering them by the attribute of interest. Here, we will not filter the cells, but instead order them by their classification.
Click Assign order under Cell order
You can filter samples using the Filtering section of the configuration dialog. Here, we will not filter out any samples or cells.
Choose Classification from the Ordering drop-down menu
Drag NK cells to the top of the Sample order
Click Finish to run (Figure 38)
Double-click the Hierarchical cluster task node to open the task report
It may initially be hard to distinguish striking differences in the heatmap. This is common in single cell RNA-Seq data because outlier cells will skew the high and low ends. We can adjust the minimum and maximum of the color scheme to improve the appearance of the heatmap.
Click Heatmap
Toggle on the Range Min and set to -2
Toggle on the Range Max and set to 2
Distinct blocks of red and blue are now more pronounced on the plot. Cells are on rows and genes are on columns. Because of the limited number of pixels on the screen, genes are grouped. You can zoom in using the zoom controls or your mouse wheel if you want to view individual gene rows. We can annotate the plot with cell attributes.
Choose Classified Cell Types from the Annotations drop-down menu
Change the Annotation font size under Style in the Annotations section
The plot now includes blocks of color along the left edge indicating the classification of the cells. We can transpose the plot to give the cell labels a bit more space.
Toggle off the Row labels under Axes to remove the sample labels
While a long list of significantly different genes is important information about a cell type, it can be difficult to identify what the biological consequences of these changes might be just by looking at the genes one at a time. Using enrichment analysis, you can identify gene sets and pathways that are over-represented in a list of significant genes, providing clues to the biological meaning of your results.
Click the Feature list data node produced by the Differential analysis filter
Click Biological interpretation
Click Gene set enrichment
We distribute the gene sets from the Gene Ontology Consortium, but Gene set enrichment can work with any custom or public gene set database.
Choose the latest assembly available from the Gene set drop-down
Click Finish
Double-click the Gene set enrichment task node to open the task report
The Gene set enrichment task report lists gene sets on rows with an enrichment score and p-value for each. It also lists how many genes in the gene set were in the input gene list and how many were not (Figure 40). Clicking the Gene set ID links to the geneontology.org page for the gene set.
In Partek Flow, you can also check for enrichment of KEGG pathways using the Pathway enrichment task. The task is quite similar to the Gene set enrichment task, but uses KEGG pathways as the gene sets.
The task report is similar to the Gene set enrichment task report with enrichment scores, p-values, and the number of genes in and not in the list (Figure 41).
Clicking the KEGG pathway ID in the Pathway enrichment task report opens a KEGG pathway map (Figure 42). The KEGG pathway maps have fold-change and p-value information from the input gene list overlaid on the map, adding a layer of additional information about whether the pathway was upregulated or downregulated in the comparison.
Color are customizable using the control panel on the left and the plot is interactive. Mousing over gene boxes gives the genes accounted for by the box, with genes present in the input list shown in bold, and the coloring gene shown in red (Figure 43).
Clicking a pathway box opens the map of that pathway, providing an easy way to explore related gene networks.
Let's start by creating a new project.
On the Home page, click New project (Figure 1)
Give the project a name
Click Create project
In the Analyses tab, click Add data
Click 10x Genomics Cell Ranger counts h5 (Figure 2)
Choose the filtered HDF5 file for the MALT sample produced by Cell Ranger
Note that Partek Flow also supports the feature-barcode matrix output (barcodes.tsv, features.tsv, matrix.mtx) from Cell Ranger. The import steps for a feature-barcode matrix are identical to this tutorial.
Click Next
Name the sample MALT (the default is the file name)
Specify the annotation used for the gene expression data (here, we choose Homo sapiens (human) - hg38 and Ensembl Transcripts release 109). If Ensembl 109 is not available from the drop-down list, choose Add annotation and download it.
Check Features with non-zero values across all samples in the Report section
Click Finish (Figure 3)
A Single cell counts data node will be created under the Analyses tab after the file has been imported. We can move on to processing the data.
Click the green icon ( ) to associate additional files or the red icon ( ) to dissociate a file from a sample. You can manually associate multiple files with one sample. Dissociating a file from a sample does not delete the file from the Partek Flow server.
Mouse over the +/- column and click the green icon ( ) to associate a file(s) to the sample. Perform the process for every sample in your project.
The user has the option of specifying an existing folder or creating a new one as the project output directory. To do so, click the icon next to the directory and specify or create a new folder in the dialog box.
Individual categories for the attribute must then be entered. Enter a name of the New category in the New category text box and click Add (Figure 11). The Name of the new category will show up in the table. The category can also be edited by clicking or deleted by clicking (visible on mouse-over). Repeat to add additional categories within the attribute.
To hide the context sensitive menu, simply click the symbol on the upper left corner of the context sensitive menu. Clicking the triangles will collapse ( ) or expand ( ) the different categories of tasks that are shown.
A verification dialog will appear (Figure 12). A yellow warning sign will show up if there some downstream tasks performed by collaborators will be affected. Deleting the tasks output files optional. If this is not selected, the task nodes will disappear from the Analyses tab but the output files will remain in the project output directory.
Data associated with any data node can be downloaded using the Download data link in the context sensitive menu (Figure 23). Compressed files will be downloaded to the local computer where the user is accessing the Partek Flow server. Note that bigger files (such as unaligned reads) would take longer to download. For guidance, a file size estimate is provided for each data node. These zipped files can easily be imported by the Partek® Genomics Suite® software.
A waiting task may be waiting for upstream tasks to complete ( ) or waiting for more computing resources to be available ( ).
The Action column contains the cancel button ( ) while a task is queued or running. Clicking this button will cancel the task. A trash icon ( ) will appear in the Action column for completed, canceled or failed tasks, and will allow the task to be deleted from the project. Deleting a task in the Queue tab will remove the corresponding nodes in the Analyses tab. Unless the user has admin privileges, a user may only cancel and delete a task that he/she started. The User, End, and Status columns may be used to filter ( ) the table.
Alternatively, you can also delete your projects directly from the Homepage by clicking Delete project under the Actions column (Figure 2).
After clicking the Delete project, a page displaying all the files associated with the project appears. Clicking the triangle will expand the list. Select the files to be deleted from the server by clicking the corresponding checkboxes next to each file (Figure 3). By default, all output files generated by the project will be deleted.
If you wish to delete the input files associated with the project, you can do that as well by clicking the Input files checkbox. Note that a warning icon appears next to input files that are used in other projects (Figure 4). These cannot be deleted until all projects associated with them are deleted.
Open a project and on the analysis page, click on the gear button and choose Export project (Figure 1). You can also export the project directly from Partek Flow home page by clicking the icon under the Action column (Figure 2).
Phred Quality Score | Probability of incorrect base call | Base call accuracy |
---|---|---|
Select
To add a collaborator, use the Add member drop-down menu. The drop-down menu will list users you are collaborating with on any project on the current instance of Partek Flow. Click a user name in the drop-down list and then click the button to add them as a collaborator. To add a user you have not collaborated with before, type their exact username (e.g., jsmith) and click the button to add them. Depending on the collaborator's preference settings, an email notification may be sent to the email address associated with their user account. To delete a collaborator, select the next to their username (you will be asked for confirmation).
We will use the default options for quantification. To learn more about the different options, please see the Quantify to annotation model (Partek E/M) user guide or mouse over the next to each option.
Select the appropriate differential analysis method (Figure 2). In this tutorial we are going to use DESeq2, but Partek Flow offers a number of alternatives. Hover the mouse over the symbol for more information on each differential analysis method, or see our Differential Analysis user guide for a more in-depth look.
Click next to the 5uM vs. 0uM comparison
Click to create a data node with only the genes that pass the filter
Click on the gray button on the right hand side of Row annot None available
Click the Export image icon in the top right corner of the plot
To view the genes associated with each GO term, select to open the extra details page. To view additional information about a GO term, click the blue gene set ID to open the linked geneontology.org entry in a new tab.
After selecting the pipeline, you will be prompted to choose the reference genome for alignment, the annotation for quantification, and the contrasts for GSA. After selecting these options, the pipeline will automatically run. See for more information.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
For more information about the dot plot, please see the user guide. To return to the DESeq2 report, switch to the browser table with filtered feature list.
In the DESeq2 report, you can select to view additional information about the statistical results for a gene or select to view the region in Chromosome View. Chromosome View is discussed in the next section of the tutorial.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Each track has Configure track and Move track buttons that can be used to modify each track.
To learn more about Chromosome view, please consult the user guide.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Click under Filter on the right
Click under Filter on the right
Click the button
Click the button
To re-collapse the task, you can double click the title bar or click the icon in the title bar. To remove the collapsed task, you can click the . Please note that this will not remove tasks, just the grouping.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you are new to Partek Flow, please see for information about data transfer and import and for information about the Partek Flow user interface.
Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric. Cells can be filtered either with the plot controls by using the selection tools on the right of the plot (rectangle mode , ellipse mode , or lasso mode ) and selecting a region on one of the plots or by setting thresholds using the Select & Filter tool. Here, we will apply a filter for the number of read counts.
Open the Select & Filter icon in the left panel. The histograms can be pinned while fine tuning the selections (Figure 2). Set the filters to represent the majority of the population (violin width)
Click the filter icon and Apply observation filter then select the Single cell counts data node to run the Filter cells task on the first Single cell counts data node, it generates a Filtered counts task node that generates a Filtered cells results node
Use Save as to give this Data Viewer session a new name (e.g. QA/QC filter) so you can return to this filter at any time and see the exact criteria that has been selected and filtered.
Click to add the recommended normalization scheme
To draw a Scree plot, in Data viewer, choose Scree plot icon available in New plot under Setup on the left panel , choose the PCA data node (Figure 7)
Click the Table option in the New plot icon under Setup and select the PCA data node to open the Component loadings table (Figure 8)
This table lists genes on rows and PCs on columns, the value in this table is correlation coefficient r. The table can be downloaded as a text file by clicking on the Export table data icon on the upper-right corner of the plot.
Coloring by one gene uses the two-color numeric palette, which can be customized by clicking . To color by more than one gene use the Numeric triad option in the drop-down. If you color by more than one gene, the color palette switches to a Green-Red-Blue color scheme with the balance between the three color channels determined by the values of the three genes. For example, a cell that expresses all three genes would be white, a cell that expresses the first two genes would be yellow, and a cell that expresses none of the genes would be black (Figure 17).
The most basic way to select a point on the scatter plot is to click it with the mouse while in pointer mode. To select multiple cells, you can hold Ctrl on your keyboard and click the cells. To select larger groups of cells, you can switch to Lasso mode by clicking in the plot controls on the right hand side. The lasso lets you freely draw a shape to select a cluster of cells.
Click to activate Lasso mode
Click (filter include) to filter to just the selected cells (Figure 26).
Click on the plot controls or toggle on Fit visible in the Axes configuration to rescale the axes to the filtered points
To revert to the original scaling, click the button again or turn off Fit visible with the toggle.
Alternatively, to exclude selected cells, click (filter exclude) (Figure 27)
Click to activate Lasso mode
You can edit the name of a classification or delete it. In this project we use the hosted feature lists for "NK cells", "T cells" and "Monocytes" to classify these cell types by coloring the cells in the t-SNE plot and selecting the cells expressing those genes as shown above. See the documentation for more information on how to add these lists. The classifications you have made are saved as a working draft so if you close the plot and return to it, the classifications will still be there and can be visualized on the plot as "New classification". However, classifications are not available for downstream tasks until you apply them. Continue classifying the clusters and save the Data viewer session until you are ready to apply the classification to the data project.
The ANOVA task report lists genes on rows and the results of the statistical test (p-value, fold change, etc.) on columns (Figure 36). For more information, please see our documentation page on the .
Genes are listed in ascending order by the p-value of the first comparison so the most significant gene is listed first. To view a volcano plot for any comparison, click . To view a violin plot for a gene, click next to the Gene ID.
Click for CCL4
Click to generate a filtered version of the table for downstream analysis
For more information about the ANOVA task, please see the section of our user manual.
Click Transposed under Axes or use the transpose button on the plot to flip the axes
As with any visualization in Partek Flow, the image can be saved as a publication-quality image to your local machine by clicking or sent to a page in the project notebook by clicking . For more information about Hierarchical clustering, please see the section of the user manual.
For information about automating steps in this analysis workflow, please see our documentation page on .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Move the .h5 file to where Partek Flow is installed using , then browse to its location.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
20
1 in 100
99%
30
1 in 1000
99.9%
40
1 in 10,000
9.99%
50
1 in 100,000
99.999%
In this tutorial, we demonstrate how to:
The tutorial data is based on 10x Genomics Datasets.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
In this tutorial, we demonstrate how to:
The tutorial data is based on 10x Genomics Datasets.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
In this tutorial, we demonstrate how to:
The tutorial is based on the work published by Venteicher and co-workers, on isocitrate dehydrogenase-mutant gliomas. Single cells from tumor biopsies were processed by flow cytometry and the libraries were prepared by Smart-seq2 protocol. The tutorial data set consists of eight expression matrix files, one per patient sample. The tumors were categorized as either astrocytoma or oligodendroglioma glioma subtype by histology. The matrix files contain gene expression values normalized by the following transformation log2[(TPM/10)+1].
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Next, we will perform some exploratory analysis on the merged mRNA and protein expression data and visualize the data in preparation to identify cell populations. Because the merged count matrix has thousands of features, it is a good idea to reduce the dimensionality of the data for more efficient downstream processing.
Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click PCA
Click Finish to run the PCA with default settings (Figure 1)
A PCA task node will be added to the pipeline under the Analyses tab and a circular PCA output data node will be produced (Figure 2).
Once the task completes, we will inspect the results to decide the optimal number of principal components (PCs) to use in downstream analyses. To do this, we will use a Scree plot.
Double click the PCA data node to open the task report
The PCA plot will open in a new data viewer session. A 3D scatterplot will be displayed on the canvas (Figure 3).
Click and drag the Scree plot from New plot under Setup on the left onto the canvas
Drop it over the Replace option (Figure 4)
Select PCA as data for the new Scree plot (Figure 5)
The Scree plot (Figure 6) shows the eigenvalues on the y-axis for each of the 100 PCs on the x-axis. The higher the eigenvalue, the more variance explained by each PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional components is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering and UMAP.
Click and drag over the first set of PCs to zoom in (Figure 7)
Mouse over the Scree plot to identify the point where additional PCs offer little additional information (Figure 8)
In this data set, a reasonable cut-off could be set anywhere between around 10 and 30 PCs. We will use 15 in downstream steps.
We can use Graph-based clustering to group similar cells together in an unsupervised manner.
Click the project name near the top to go back to the Analyses tab
Click the circular PCA data node
Click Exploratory analysis in the toolbox
Click Graph-based clustering
Click to Compute biomarkers
Set the number of principal components to 15 (Figure 9)
Click Configure under Advanced options and change the Resolution to 1.0
Click Finish to run the task
A Graph-based clustering task node will be added to the pipeline under the Analyses tab and a circular Graph-based clusters output data node will be produced (Figure 10)
Once the graph-based clustering task has completed, we can visualize the results with a UMAP plot. You could use the same steps here to generate a t-SNE plot. For this tutorial, we will use UMAP, as it is faster on several thousand cells.
Click the circular PCA data node
Click Exploratory analysis in the toolbox
Click UMAP
Set the number of principal components to 15 (Figure 11)
Click Finish to run the task
A UMAP task node will be added to the pipeline under the Analyses tab and a circular UMAP output data node will be produced (Figure 12)
In this tutorial, we have performed exploratory analysis on merged protein and gene expression data, and we will perform classification on the merged data in the next step.
It can be interesting to perform exploratory analysis on the two feature types separately. For example, you might be interested to see how the clustering of the same cells differs between protein expression profiles vs. gene expression profiles.
To perform exploratory analysis on the two feature types separately, select the Merged counts data node, click Pre-analysis tools, followed by Split by feature type from the toolbox. A new task, Split by feature type, will be added to the pipeline resulting in two output data nodes: Antibody capture (protein data) and Gene expression (mRNA data). Both contain the same high-quality cells.
Performing exploratory analysis with gene expression data is the same as for the merged counts. Because there are a large number of genes, you will need to reduce the dimensionality with PCA, choose an optimal number of PCs and perform downstream clustering and visualization (e.g. graph-based clustering and UMAP/t-SNE). Performing exploratory analysis with protein data is different. There is no need to reduce the dimensionality as there are only a handful of features (17 proteins in this case), so you can proceed straight to downstream clustering and visualization. Figure 13 shows an example of how the pipeline might look if the data is split and analyzed separately.
You can then use the Data viewer to bring together multiple plots for comparison (Figure 14).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The fastq files are not pre-processed. The steps covered here will show you how to import and pre-process of the Visium Spatial Gene Expression data with brightfield and fluorescence microscope images.
The sample used for this tutorial can be found in the 10x Genomics Datasets. We will use the Control, replicate 1 mouse brain sample.
Choose the 10x Genomics Visium fastq import format
Click Next
If you have not transferred files to the server already, click here for more details and choose to Transfer files to the server.
Select the fastq files in the upload folder used for file transfer (select all sample files at one time; including R1 and R2 for each sample)
Click Finish
The prefix used for R1 and R2 fastq files should match; one sample is shown in this example.
The fastq files will be imported into the project as an Unaligned reads node.
The unaligned reads must be preprocessed before proceeding with the analysis steps covered here: Spatial data analysis.
From the unaligned reads node, select Space Ranger from the 10x Genomics drop-down in the toolbox.
For more information about Space Ranger click here.
Specify the type of 10x Visium assay; this tutorial uses the Visium CytAssist gene expression library as the assay type
If you have not done so already, a Cell Ranger reference should be created
Specify the Reference assembly
Select the Image and Probe set files that have already been transferred to the server for all samples
Choose visium-2-large as the Slide parameter because this Visium CystAssist sample used a 11 x 11 slide capture area
Click Finish
The Space Ranger task output results in a Single cell counts node.
The tissue image must be annotated to associate the microscopy image with the expression data.
Click the newly created Single cell counts data node
Click the Annotation/Metadata section in the toolbox
Click Annotate Visium image
Click on the Browse button to open the file browser and point to the file _spatial.zip, created by the Space Ranger task
Click Finish
Select the zipped image folder for each sample. The image zip file should contain 6 files including image files and tissue position text file with a scale factor json file. The setup page shows the sample table (one sample per row).
You can find the location of the _spatial.zip file using the following steps. Select the Space Ranger task node (i.e. the rectangle) and then click on the Task Details (toolbox). Click on the Output files link to open the page with the list of files created by the Space Ranger task. Mouse over any of the files to see the directory in which the file is located. The figure below shows the path to the .zip file which is required for Annotate Visium image.
Mousing over a file on the Output files page shows a balloon with the file location.
A new data node, Annotated counts, will be generated.
The Annotated counts node is Split by sample. This means that any tasks performed from this node will also be split by sample. Invoke tasks from the Single cell counts node to combine samples for analyses.
Annotate Visium image task creates a new node, Annotated counts. Double click on the Annotated counts node to invoke the Data Viewer showing data points overlaid on top of the microscopy image.
Data Viewer session as a result of opening an Annotated counts data node. Each data point is a tissue spot.
Proceed with analysis from the Single cell counts node. Click here to learn about viewing the multiple tissue images in the Data Viewer.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Next, we will filter out certain cells and re-split the data. Re-splitting the data can be useful if you want to perform differential analysis and downstream analysis separately for proteins and genes. For your own analyses, re-splitting the data is optional. You could just as well continue with differential analysis with the merged data if you prefer.
Because we have classified our cells, we can now filter based on those classifications. This can be used to focus on a single cell type for re-clustering and sub-classification or to exclude cells that are not of interest for downstream analysis.
Click the Merged counts data node
Click Filtering
Click Filter cells
Set to exclude Cell type is Doublets using the drop-down menus
Click OR
Set the second filter to exclude Cell type is N/A using the drop-down menus
Click Finish to apply the filter (Figure 1)
This produces a Filtered counts data node (Figure 2).
Click the Filtered counts data node
Click Pre-analysis tools
Click Split by feature type
This will produce two data nodes, one for each data type (Figure 3). The split data nodes will both retain cell classification information.
Once we have classified our cells, we can use this information to perform comparisons between cell types or between experimental groups for a cell type. In this project, we only have a single sample, so we will compare cell types.
Click the Antibody Capture data node
Click Statistics
Click Differential analysis
Click ANOVA then click Next
The first step is to choose which attributes we want to consider in the statistical test.
Click Cell type
Click Add factor
Click Next
Next, we will set up the comparison we want to make. Here, we will compare the Activated and Mature B cells.
Drag Activated B cells in the top panel
Drag Mature B cells in the bottom panel
Click Add comparison
The comparison should appear in the table as Activated B cells vs. Mature B cells.
Click Finish to run the statistical test (Figure 4)
The ANOVA task produces an ANOVA data node.
Double-click the ANOVA data node to open the task report
The report lists each feature tested, giving p-value, false discovery rate adjusted p-value (FDR step up), and fold change values for each comparison (Figure 5).
In addition to the listed information, we can access dot and violin plots for each gene or protein from this table.
This opens a dot plot in a new data viewer session, showing CD45A expression for cells in each of the classifications (Figure 6). First, we exclude Doublets and N/A cells from the plot:
Open Select and filter, select Criteria
Drag "Cell type" from the legend title to the Add criteria box
Uncheck Doublets and N/A
Click to include selected points
We can use the Configuration panel on the left to edit this plot.
Open the Style icon
Switch on Violins under Summary
Switch on Overlay under Summary
Switch on Colored under Summary
Select the Graph-based clustering node in the Color by section
Color by Graph-based clusters under Color and use the slider to decrease the Opacity
Open the Axes icon
Select the Graph-based clustering node in the X axis section
Change the X axis data to Graph-based clusters
Use the slider to increase the Jitter on the X axis (Figure 7)
Click the project name to return to the Analyses tab
To visualize all of the proteins at the same time, we can make a hierarchical clustering heat map.
Click the ANOVA data node
Click Exploratory analysis in the toolbox
Click Hierarchical clustering/heatmap
In the Cell order section, choose Graph-based clusters from the Assign order drop-down list
Click Finish to run with the other default settings
Double-click the Hierarchical clustering task node to open the heatmap
The heatmap can easily be customized using the tools on the left.
Open the Axes icon
Switch off Show Row labels
Increase the Font to 16 (Figure 8)
Activate the Transpose switch which will switch the Row and Column labels, so now the Row labels will be shown (Figure 9)
Open the Dendrograms icon
Choose Row color By cluster and change Row clusters to 4
Change Row dendrogram size to 80 (Figure 10)
In the Heatmap icon
Navigate to Range under Color
Set the Min and Max to -1.2 and 1.2, respectively
Change the Shape to Circle (Figure 11)
Switch the Shape back to Rectangle
Change the Color Palette by clicking on the color squares and selecting colors from the rainbow. Click outside of the selection box to exit this selection. The color options can be dragged alone the Palette to highlight value differences (Figure 12).
Feel free to explore the other tool options on the left to customize the plot further.
We can use a similar approach to analyze the gene expression data.
Click the project name to return to the Analyses tab
Click the Gene Expression data node
Click the Antibody Capture data node
Click Statistics
Click Differential analysis
Click ANOVA then click Next
Click Cell type
Click Add factor
Click Next
Drag Activated B cells in the top panel
Drag Mature B cells in the bottom panel
Click Add comparison
The comparison should appear in the table as Activated B cells vs. Mature B cells.
Click Finish to run the statistical test
As before, this will generate an ANOVA task node and n ANOVA data node.
Double-click the ANOVA task node to open the task report (Figure 13)
Because more than 20,000 genes have been analyzed, it is useful to use a volcano plot to get an idea about the overall changes.
The Volcano plot opens in a new data viewer session, in a new tab in the web browser. It shows each gene as a point with cutoff lines set for P-value (y-axis) and fold-change (x-axis). By default, the P-value cutoff is set to 0.05 and the fold-change cutoff is set at |2| (Figure 14).
Click the ANOVA report tab in your web browser to return to the full report
We can filter the full set of genes to include only the significantly different genes using the filter panel on the left.
Click FDR step up
Type 0.05 for the cutoff and press Enter on your keyboard
Click Fold change
Set to From -2 to 2 and press Enter on your keyboard
The number at the top of the filter will update to show the number of included genes (Figure 15).
A task, Differential analysis filter, will run and generate a new Filtered Feature list data node. We can get a better idea about the biology underlying these gene expression changes using gene set or pathway enrichment. Note, you need to have the Pathway toolkit enabled to perform the next steps.
Click the Filtered feature list data node
Click Biological interpretation in the toolbox
Click Pathway enrichment
Make sure that Homo sapiens is selected in the Species drop-down menu
Click Finish to run
Double-click the Pathway enrichment task node to open the task report
The pathway enrichment results list KEGG pathways, giving an enrichment score and p-value for each (Figure 16).
To get a better idea about the changes in each enriched pathway, we can view an interactive KEGG pathway map.
Click path:hsa05202 in the Transcriptional misregulation in cancer row
The KEGG pathway map shows up-regulated genes from the input list in red and down-regulated genes from the input list in green (Figure 17).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
We will now examine the results of our exploratory analysis and use a combination of techniques to classify different subsets of T and B cells in the MALT sample.
Double click the merged UMAP data node
Under Configure on the left, click Style, select the Graph-based cluster node, and color by the Graph-based attribute (Figure 1)
The 3D UMAP plot opens in a new data viewer session (Figure 2). Each point is a different cell and they are clustered based on how similar their expression profiles are across proteins and genes. Because a graph-based clustering task was performed upstream, a biomarker table is also displayed under the plot. This table lists the proteins and genes that are most highly expressed in each graph-based cluster. The graph-based clustering found 11 clusters, so there are 11 columns in the biomarker table.
Click and drag the 2D scatter plot icon from New plot onto the canvas (Figure 2)
Drop the 2D scatter plot to the right of the UMAP plot
Click Merged counts to use as data for the 2D scatter plot (Figure 3)
A 2D scatter plot has been added to the right of the UMAP plot. The points in the 2D scatter plot are the same cells as in the UMAP, but they are positioned along the x- and y-axes according to their expression level for two protein markers: CD3_TotalSeqB and CD4_TotalSeqB, respectively (Figure 4).
In Select & Filter, click Criteria to change the selection mode
Click the blue circle next to the Add rule drop-down menu (Figure 5)
Click Merged counts to change the data source
Choose CD3_TotalSeqB from the drop-down list (Figure 6)
Click and drag the slider on the CD3D_TotalSeqB selection rule to include the CD3 positive cells (Figure 7)
As you move the slider up and down, the corresponding points on both plots will dynamically update. The cells with a high expression for the CD3 protein marker (a marker for T cells) are highlighted and the deselected points are dimmed (Figure 8).
Click Merged counts in Get data on the left under Setup
Click and drag CD8a_TotalSeqB onto the 2D scatter plot (Figure 9)
Drop CD8_TotalSeqB onto the x-axis configuration option
The CD3 positive cells are still selected, but now you can see how they separate into CD4 and CD8 positive populations (Figure 10).
The simplest way to classifying cell types is to look for the expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than with gene expression data alone as the protein expression data has a better dynamic range and is less sparse. Additionally, many cell types have expected cell surface marker profiles established using other technologies such as flow cytometry or CyTOF. Let's compare the resolution power of the CD4 and CD8A gene expression markers compared to their protein counterparts.
Click the duplicate plot icon above the 2D scatter plot (Figure 11)
Click Merged counts in the Get Data icon under Setup
Search for the CD4 gene
Click and drag CD4 onto the duplicated 2D scatter plot
Drop the CD4 gene onto the y-axis option
Search for the CD8A gene
Click and drag CD8A onto the duplicated 2D scatter plot
Drop the CD8A gene onto the x-axis option
The second 2D scatter plot has the CD8A and CD4 mRNA markers on the x- and y-axis, respectively (Figure 12). The protein expression data has a better dynamic range than the gene expression data, making it easier to identify sub-populations.
Manually select the cells with high expression of the CD4_TotalSeqB protein marker (Figure 13)
More than 2000 cells show positive expression for the CD4 cell surface protein.
Let's perform the same test on the gene expression data.
Click on a blank spot on the plot to clear the selection
Manually select the cells with high expression of the CD4 gene marker (Figure 14)
This time, only 500 cells show positive expression for the CD4 marker gene. This means that the protein data is less sparse (i.e. there fewer zero counts), which further helps to reliably detect sub-populations.
Based on the exploratory analysis above, most of the CD3 positive cells are in the group of cells in the right side of the UMAP plot. This is likely to be a group of T cells. We will now examine this group in more detail to identify T cell sub-populations.
Draw a lasso around the group of putative T cells (Figure 15)
Click and drag the plot to rotate it around
This group of putative T cells predominantly consists of cells assigned to graph-based clusters 3, 4, and 6, indicated by the colors. Examining the biomarker table for these clusters can help us infer different types of T cell.
Click and drag the bar between the UMAP plot and the biomarker table to resize the biomarker table to see more of it (Figure 17)
Cluster 6 has several interesting biomarkers. The top biomarker is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Another biomarker is the PD-1 protein, which is expressed in Tfh cells. This protein promotes self-tolerance and is a target for immunotherapy drugs. The TIGIT protein is also expressed in cluster 6 and is another immunotherapy drug target that promotes self-tolerance.
Cluster 4 expresses several marker genes associated with cytotoxicity (e.g. NKG7 and GZMA) and both CD3 and CD8 proteins. Thus, these are likely to be cytotoxic cells.
We can visually confirm these expression patterns and assess the specificity of these markers by coloring the cells on the UMAP plot based on their expression of these markers.
Click the duplicate plot icon above the UMAP plot
We will color the cells on the duplicate by their expression of marker genes, while keeping the original plot colored by graph-based cluster assignment.
Click and drag the CXCL13 gene from the biomarker table onto the duplicate UMAP plot
Drop the CXCL13 gene onto the Green (feature) option (Figure 18)
Click and drag the NKG7 gene from the biomarker table onto the duplicate UMAP plot
Drop the NKG7 gene onto the Red (feature) option
The cells with higher CXCL13 and NKG7 expression are now colored green and red, respectively. By looking at the two UMAP plots side by side, you can see these two marker genes are localized in graph-based clusters 6 and 4, respectively (Figure 19).
Click the blue circle next to the Add criteria drop-down list
Search for Graph to search for a data source
Select Graph-based clustering (derived from the Merged counts > PCA data nodes)
Click the Add criteria drop-down list and choose Graph-based to add a selection rule (Figure 20)
In the Graph-based filtering rule, click All to deselect all cells
Click cluster 6 to select all cells in cluster 6
Using the Classify tool, click Classify selection
Label the cells as Tfh cells (Figure 21)
Click Save
Click cluster 4 to select all cells in cluster 4
In the Classify icon, click Classify selection
Label the cells as Cytotoxic cells
Click Save
We can classify the remaining cells as helper T cells, as they predominantly express the CD4 protein marker.
Click on the invert selection icon in either of the UMAP plots (Figure 22)
In Classify, click Classify selection
Label the cells as Helper T cells
Click Save
Let's look at our progress so far, before we classify subsets of B-cells.
Click the Clear filters link in Select & Filter
Select the duplicate UMAP plot (with the cell colored by marker genes)
Under Configure on the left, open Style and color the cells by New classifications (Figure 23)
In addition to T-cells, we would expect to see B lymphocytes, at least some of which are malignant, in a MALT tumor sample. We can color the plot by expression of a B cell marker to locate these cells on the UMAP plot.
In the Get data icon on the left, click Merged counts
Scroll down or use the search bar to find the CD19_TotalSeqB protein marker
Click and drag the CD19_TotalSeqB marker over to the UMAP plot on the right
Drop the CD19_TotalSeqB marker over the Color configuration option on the plot
The cells in the UMAP plot are now colored from grey to blue according to their expression level for the CD19 protein marker (Figure 24). The CD19 positive cells correspond to several graph-based clusters. We can filter to these cells to examine them more closely,
Lasso around the CD19 positive cells (Figure 25)
The plots will rescale to include the selected points. The CD19 positive cells include cells from graph-based clusters 1, 2 and 7 (Figure 26).
Find the CD3_TotalSeqB protein marker in the biomarker table
Click and drag the CD3_TotalSeqB onto the UMAP plot on the right
Drop the CD3_TotalSeqB protein marker onto the Color configuration option on the plot (Figure 27)
While these cells express T cell markers, they also group closely with other putative B cells and express B cell markers (CD19). Therefore, these cells are likely to be doublets.
Select either of the UMAP plots
Click on the Select & Filter
Find the CD3_TotalSeqB protein marker in the biomarker table
Click and drag CD3_TotalSeqB onto the Add criteria drop-down list in Select & Filter (Figure 28)
Set the minimum threshold to 3 in the CD3_TotalSeqB selection (Figure 29)
Click the Classify icon then click Classify selection
Label the cells as Doublets
Click Save
The biomarkers for clusters 1 and 2 also show an interesting pattern. Cluster 1 lists IGHD as its top biomarker, while cluster 2 lists IGHA1 as the fourth most significant. Both IGHD (Immunoglobulin Heavy Constant Delta) and IGHA1 (Immunoglobulin Heavy Constant Alpha 1) encode classes of the immunoglobulin heavy chain constant region. IGHD is part of IgD, which is expressed by mature B cells, and IGHA1 is part of IgA1, which is expressed by activated B cells. We can color the plot by both of these genes to visualize their expression.
Click, drag and drop IGHD from the biomarker table onto the Green (feature) configuration option on the UMAP plot on the right
Click, drag and drop IGHA1 from the biomarker table onto the Red (feature) configuration option on the UMAP plot on the right (Figure 30)
We can use the lasso tool to select and classify these populations.
Lasso around the IGHD positive cells (Figure 31)
In the Classify icon on the left, click Classify selection
Label the cells as Mature B cells
Click Save
Lasso around the IGHA1 positive cells (Figure 32)
In the Classify icon on the left, click Classify selection
Label the cells as Activated B cells
Click Save
We can now visualize our classifications.
Click the Clear filters link in the Select & Filter icon on the left
Select the duplicate UMAP plot (with the cell colored by marker genes)
Under Configure on the left, click the Style icon and color the cells by New classifications (Figure 33)
Click Apply classifications in the Classify icon
Name the attribute Cell type
Click Run
Click OK to close the message about a classification task being enqueued
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The project includes Human Breast Cancer (In Situ Replicate 1) and Human Breast Cancer (In Situ Replicate 2) files in one project.
Obtain the Xenium Output Bundles (Figure 1) for each sample.
Navigate the options to select 10x Genomics Xenium Output Bundle as the file format for input. Choose to import 10x Genomics Xenium for your project (Figure 2).
Click Transfer files on the homepage, under settings, or during import.
You will need to decompress the Xenium Output Bundle zip file before they are uploaded to the server. After decompression, you can drag and drop the entire folder into the Transfer files dialog, all individual files in the folder will be listed in the Transfer files dialog after drag & drop, with no folder structure (Figure 4). The folder structure will be restored after upload is completed.
The Xenium output bundle should be included for each sample (Figure 5). Each sample requires the whole sample folder or a folder containing these 6 files: cell_feature_matrix.h5, cells.csv.gz, cell_boundaries.csv.gz, nucleus_boundaries.csv.gz, transcripts.csv.gz, morphology_focus.ome.tif. Once added, the Cells and Features values will update. You can choose an annotation file during import that matches what was used to generate the feature count.
Do not limit cells with a total read count since Xenium data is targeted to less features.
Once the download completes, the sample table will appear in the Metadata tab, with one row per sample (Figure 6).
The sample table is pre-populated with one sample attributes: # Cells. Sample attributes can be added and edited manually by clicking Manage in the Sample attributes menu on the left. If a new attribute is added, click Assign values to assign samples to different groups. Alternatively, you can use the Assign values from a file option to assign sample attributes using a tab-delimited text file. For more information about sample attributes, see here. Cell attributes are found under Sample attributes and can be added by publishing cell attributes to a project.
For this tutorial, we do not need to edit or change sample attributes.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Space Ranger output files are pre-processed 10x Genomics Visium data. The steps covered here will show you how to import and continue analyses with this pre-processed data from the Space Ranger pipeline. Partek Flow refers to this high cellular resolution data as Single cell counts; each point (spot) can be 1-10 cell resolution depending on the tissue type*.
The project includes Human Colon Cancer (Replicate 1) and Human Colon Cancer (Replicate 2) output files in one project.
Obtain the filtered Count matrix files (h5 or HDF5) files and Spatial outputs for each sample
The spatial imaging outputs should be in compressed format.
Navigate the options to select 10x Genomics Visium Space Ranger output as the file format for input
Click Transfer files on the homepage, under settings, or during import
Proceed to transfer files as shown below using the 10x Genomics Visium Space Ranger outputs importer.
Navigate to the appropriate files for each sample. Please note that the 10x Genomics Space Ranger output can be count matrix data as 1 filtered .h5 file per sample or sparse matrix files for each sample as 3 files (two .csv with one .mtx or two .tsv with one .mtx for each sample). The spatial output files should be in compressed format (.zip). The high resolution image can be uploaded and is optional.
Count matrix files and spatial outputs should be included for each sample. Once added, the Cells and Features values will update.
Once the download completes, the sample table will appear in the Metadata tab, with one row per sample.
The sample table is pre-populated with sample attributes, # Cells. Sample attributes can be added and edited manually by clicking Manage in the Sample attributes menu on the left. If a new attribute is added, click Assign values to assign samples to different groups. Alternatively, you can use the Assign values from a file option to assign sample attributes using a tab-delimited text file. For more information about sample attributes, see here.
For this tutorial, we do not need to edit or change any sample attributes.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Follow along to add files to your Xenium project: Add files to the project.
Filter the data including control probes using the Filter features task
Choose Feature metadata filter
Include the Gene Expression features
Click Finish
This results in a Filter features task (rectangle) and node (circle) results.
Right click the circle to Rename data node to "Filtered to only gene expression"
Click the circular "Filtered to only gene expression" results node and select the Single cell QA/QC task from the context sensitive menu on the right
When the task completes it will be opaque and no longer transparent with a progress bar
Double click the opaque rectangle task to open and filter cells as described here. Apply the observation filter to the "Filtered to only gene expression" results node. This results in a "Filtered cells node".
Select the "Filtered cells node" and choose the Normalization task from the Normalization and scaling drop-down in the task menu
Click the Use recommended button to proceed with these settings
Click Finish
This results in a Normalized counts node as shown below in the pipeline.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Click the Normalized counts data node
Expand the Exploratory analysis section of the task menu
Click PCA
In this tutorial we will modify the PCA task parameters, to not split by sample, to keep the cells from both samples on the PCA output.
Uncheck (de-select) the Split by sample checkbox under Grouping
Click Finish
Double-click the circular PCA node to view the results
From this PCA node, further exploratory tasks can be performed (e.g. t-SNE, UMAP, and Graph-based clustering).
Choose Style under Configure
Color by and search for fasn by typing the name
Select FASN from the drop-down
The colors can be customized by selecting the color palette then using the color drop-downs as shown below.
Ensure the colors are distinguishable such as in the image above using a blue and green scale for Maximum and Minimum, respectively.
Click FASN in the legend to make it draggable (pale green background) and continue to drag and drop FASN to Add criteria within the Select & Filter Tool
Hover over the slider to see the distribution of FASN expression
Multiple gene thresholds can be used in this type of classification by performing this step with multiple markers.
Drag the slider to select the population of cells expressing high FASN (the cutoff here is 10 or the middle of the distribution).
Click Classify under Tools
Click Classify selection
Give the classification a name "FASN high"
Under the Select & Filter tool, choose Filter to exclude the selected cells
Exit all Tools and Configure options
Click the "X" in the right corner
Use the rectangle selection mode on the PCA to select all of the points on the image
This results in 147538 cells selected.
Open Classify
Click Classify selection and name this population of cells "FASN low"
Click Apply classifications and give the classification a name "FASN expression"
Now we will be able to use this classification in downstream applications (e.g. differential analysis).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Here we are starting with Spacer Ranger outputs as the Single cell counts node.
A basic example of a spatial data analysis, starting from the Single cell counts node, is shown below and is similar to a Single cell RNA-Seq analysis pipeline with the addition of the Spatial report task (shown) or Annotate Visium image task (not shown).
Note that QA/QC has not been performed in this example, to visualize all spots (points) on the tissue image. Single cell QA/QC can be performed from the Single cell counts node with the filtered cells applied to the Single cell counts before the Filter features task. Click here for more information on Single cell QA/QC (see the pipeline in Figure 11).
A context-sensitive menu will appear on the right side of the pipeline. Use the drop-downs in the toolbox to open available tasks for the selected data node.
Low-quality cells can be filtered out during the spatial data analysis using QA/QC and will not be viewed on the tissue image. Click here for more information on Single cell QA/QC. We will not perform Single cell QA/QC in this tutorial; this task would be invoked from the Single cell counts node and the Filter features task discussed below would be invoked from this output node (Filtered counts).
Remove gene expression counts that are not relevant to the analysis.
Click the Filtering drop-down in the toolbox
Click the Filter Features task
Choose Noise reduction
Exclude features where value <= 0.0 in at least 99.0% of the cells
Click Finish
Remove gene expression values that are zero in the majority of the cells.
A task node, Filtered counts, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Filter features task node to indicate that the task is running.
Normalize (transform) the cells to account for variability between cells.
Select the Filtered Counts result node
Choose the Normalization task from the toolbox
Click Use recommended
Click Finish
Explore the data by dimension reduction and clustering methods.
Click the Normalized counts result node
Select the PCA task under Exploratory analysis in the toolbox
Unselect Split by Sample
Click Finish
The PCA result node generated by the PCA task can be visualized by double-clicking the circular node.
Single click the PCA result node
Select the Graph-based clustering task from the toolbox
Click Finish
The results of graph-based clustering can be viewed by PCA, UMAP, or t-SNE. Follow the steps outlined below to generate a UMAP.
Select the Graph-based clustering result node by single click
Select the UMAP task from the toolbox
Click Finish
Double-click the UMAP result node
The UMAP is automatically colored by the graph-based clustering result in the previous node. To change the color, click Style.
Classify the cells using Garnett automatic classification to determine cell types.
Click the Filtered counts node
From the Classification drop-down in the toolbox, select Classify cell type
Using the Managed classifiers, select the human Intestine Garnett classifier
Click Finish
The output of this task produces the Classify result node.
Double-click the Classify result node to view the cell count for each cell type and the top marker features for each cell type.
Publish cell attributes to the project to make this attribute accessible for downstream applications.
Click the Classify result node
Select Publish cell attributes to project under Annotation/Metadata
Name the cell attribute
Click Finish
Publish cell attributes can be applied to result nodes with cell annotation (e.g. click the graph-based clustering result node and follow the same steps).
An example of this completed task is shown below.
Since this attribute has been published, we can choose to right-click the Publish cell attributes to project node and remove this from the pipeline. This attribute will be managed in the Metadata tab (discussed below).
The name of the Cell attribute can be changed in the Metadata tab (right of the Analyses tab).
Click Manage
Click the Action dots
Choose Modify attribute
Rename the attribute Cell Type
Click Save
Click Back to metadata tab
Drag and drop the categories to rearrange the order of these categories, The order here will determine the plotted order and legend in visualizations.
We can use these Cell attributes in analyses tasks such as Statistics (e.g. differential analysis comparisons) as well as to Style the visualizations in the Data Viewer.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
With the pre-processed samples imported, we can begin analysis.
Click Analyses to switch to the Analyses tab
For now, the Analyses tab has a starting node, a circular node called Single cell counts and also a rectangular task node called Spatial report which was automatically generated for this type of data. As you perform analyses, additional nodes representing tasks and new data will be created, forming a visual representation of your analysis pipeline.
Click the Spatial report node
Click Task report on the task menu
The spatial report will display the first sample (Replicate 1). We want to visualize all of the samples using the steps below.
Duplicate the plot by clicking the Duplicate plot button in the upper right controls (arrow 1)
Open the Axes configuration option (arrow 2)
Change the Sample on the duplicated image under Misc (arrow 3)
Each data point is a tissue spot. Duplicate and change the sample to view multiple samples.
If starting with unprocessed fastq files, the Annotate Visium image task will create a new result node, Annotated counts.
Double click on the Annotated counts node to invoke the Data Viewer showing data points overlaid on top of the microscopy image
Follow the steps outlined above by duplicating the image to visualize the multiple samples
To modify the points on the image to show more of the background image use the Style configuration option.
Press and hold Ctrl or Shift to select both plots
Click Style in the left panel
Move the Opacity slider to the left
Change the Point size to 3
Click Save in the left panel and give the session an appropriate name
Modify the axes to remove the X and Y coordinates from the tissue image.
Press and hold Ctrl or Shift to select both plots
Click Axes in the left panel
Toggle off Show lines for both the X & Y axis
Toggle off Show title and Show axis for both the X & Y axis
Style the image and color by normalized gene expression using three genes of interest.
Press and hold Ctrl or Shift to select both plots
Click Style
Select the Normalized counts node as the source
Choose to Color by Numeric triad
Use the Green drop-down to select IL32, Red drop-down to select DES, and Blue drop-down to select PTGDS genes (type in name of gene in drop-down)
Increase the Point size to 11
To color by the Cell attribute "Cell Type" which we previously determined in this tutorial, use the Color by drop-down and select Cell Type. Cell type is a blue categorical attribute while green attributes are numerical.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
We will compare the classification (FASN expression) we previously made based on expression levels of the FASN gene. Here, we will compare FASN high and FASN low cells to identify genes and pathways.
Select the Normalized counts node and choose Compute biomarkers from the Statistics drop-down
Choose the "FASN expression" attribute
Do not select Split by sample
Click Finish
This results in a Biomarkers report.
Double-click the Biomarkers results node to open the report
The top features are reported for the comparison.
Click your username in the top right corner
Select Settings from the drop-down
Choose Lists from the Components drop-down in the menu on the left
Use the + New list button to add these 10 genes
Choose Text as the list option
Give the list a Name and Description
Enter the 10 genes in column format as shown below
Click Add list
The list has been added and can now be used for further analysis. The Actions button can be used to modify this list if necessary, as shown below.
Go to the Analyses tab
Select the Normalized counts node
Choose Gene set enrichment from the Biological interpretation drop-down in the task menu
Use the KEGG database for pathway enrichment
Check Specify background gene list
Select "Top 10 FASN high Features" as the Background gene list
Click Finish
This results in a Pathway enrichment report, as shown below.
Double-click the report to view the pathways involved in this list of genes
The tutorial data set is available through Partek Flow.
Click your avatar (Figure 1)
Click Settings
On the System information page, the Download tutorial data section includes pre-loaded data sets used by Partek Flow tutorials (Figure 2).
Click Single cell glioma (multi-sample)
The tutorial data set will be downloaded onto your Partek Flow server and a new project, Glioma (multi-sample), will be created. You will be directed to the Data tab of the new project. Because this is a tutorial project, there is no need to click on Import data, as the import is handled automatically (Figure 3).
You can wait a few minutes for the download to complete, or check the download progress by selecting Queue then View queued tasks... to view the Queue (Figure 4).
Once the download completes, the sample table will appear in the Data tab, with one row per sample (Figure 5).
For this tutorial, we do not need to edit or change any sample attributes.
With samples imported and annotated, we can begin analysis.
Click Analyses to switch to the Analyses tab
For now, the Analyses tab has only a single node, Single cell counts. As you perform the analysis, additional nodes representing tasks and new data will be created, forming a visual representation of your analysis pipeline.
Click on the Single cell counts node
A context-sensitive menu will appear on the right-hand side of the pipeline (Figure 9). This menu includes tasks that can be performed on the selected counts data node.
An important step in analyzing single cell RNA-Seq data is to filter out low-quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few counts to be analyzed.
Expand the QA/QC section of the task menu
Click on Single cell QA/QC (Figure 6)
A task node, Single cell QA/QC, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Single cell QA/QC task node to indicate that the task is running.
Click the Single cell QA/QC node once it finishes running
Click Task report on the task menu (Figure 7)
The Single cell QA/QC report opens in a new data viewer session. There are interactive violin plots showing the most commonly used quality metrics for each cell from all samples combined (Figure 8). For this data set, there are two relevant plots: the total count per cell and the number of detected genes per cell. Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric. Typically, there is a third plot showing the percentage of mitochondrial counts per cell, but mitochondrial transcripts were not included in the data set by the study authors, so this plot is not informative for this data set.
Remove the % mitochondrial counts and the extra text box in the bottom right by clicking Remove plot in the top right corner of each plot (Figure 8).
The plots are highly customizable and can be used to explore the quality of cells in different samples.
Click on Single cell counts in the Get Data icon on the left (Figure 9)
Click and drag the Sample name attribute onto the Counts plot and drop it onto the X-axis
Repeat this for the Detected genes plot
The cells are now separated into different samples along the x-axis (Figure 10)
Hold Control and left-click to select both plots
Open the Style icon on the left under Configure
Under Color, use the slider to reduce the Opacity
Open the Axis icon on the left
Adjust the X-rotation on the plots to 90
Note how both plots were modified at the same time.
Cells can be selected by setting thresholds using the Select & Filter tool. Here, we will select cells based on the total count
Open Select & Filter under Tools on the left
Under Criteria, Click Pin histogram to see the distribution of counts
Set the Counts thresholds to 8000 and 20500
Selected cells will be in blue and deselected cells will be dimmed (Figure 11).
Because this data set was already filtered by the study authors to include only high-quality cells, this count filter is sufficient.
Click Apply observation filter
Click the Single cell counts data node in the pipeline preview (Figure 12)
Click Select
A new task, Filter counts, is added to the Analyses tab. This task produces a new Filter counts data node (Figure 13).
Click on the Glioma (multi-sample) project name at the top to go back to the Analyses tab
Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab
Most tasks can be queued up on data nodes that have not yet been generated, so you can wait for filtering step to complete, or proceed to the next section.
A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes. Because there is no gold standard for what makes a gene informative or not, ideal gene filtering criteria depends on your experimental design and research question. Thus, Partek Flow has a wide variety of flexible filtering options.
Click the Filter counts node produced by the Filter counts task
Click Filtering in the task menu
Click Filter features (Figure 14)
There are four categories of filter available - noise reduction, statistics based, feature metadata, and feature list (Figure 15).
The noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The statistics based filter is useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The feature list filter allows you to filter your data set to include or exclude particular genes.
We will use a noise reduction filter to exclude genes that are not expressed by any cell in the data set but were included in the matrix file.
Click the Noise reduction filter checkbox
Set the Noise reduction filter to Exclude features where value <= 0 in 99% of cells using the drop-down menus and text boxes (Figure 16)
Click Finish to apply the filter
This produces a Filtered counts data node. This will be the starting point for the next stage of analysis - identifying cell types in the data using the interactive t-SNE plot.
We are omitting normalization in this tutorial because the data has already been normalized.
The tutorial data set is taken from a published study and has already been normalized using TPM (Transcripts per million), which normalizes for the length of feature and total reads, and transformed as log2(TPM/10+1). This normalization and transformation scheme can be performed in Partek Flow, along with other commonly used RNA-Seq data normalization methods.
For your convenience, here is a video showing the below steps.
This guide illustrates how to process FASTQ files produced using the 10x Genomics Chromium Single Cell ATAC assay to obtain a Single cell counts data node, which is the starting point for analysis of single-cell ATAC experiments.
We recommend uploading your FASTQ files (fastq.gz) to a folder on your Partek Flow server before importing them into a project. Data files can be transferred into Flow from the Home page by clicking the Transfer file button (Figure 1). Following the instruction In Figure 1 to complete the data transfer. Users have the option to change the Upload directory by clicking the Browse button and either select another existing directory or create a new directory.
To create a new project, from the Home page click the New Project button; enter a project name and then click Create project. Once a new project has been created, click the Add data button in the Analyses tab.
To proceed, click the Add data button in the Analyses tab. In the Single cell > scATAC-Seq section select fastq and click Next. The file browser interface will open (Figure 3). Select the FASTQ files using the file browser interface and push the Finish button to complete the task. Paired end reads will be automatically detected and multiple lanes for the same sample will be automatically combined into a single sample. We encourage users to include all the FASTQ files including the index files although they are optional.
When the FASTQ files have finished importing, the Unaligned reads data node will appear in the Analyses tab.
To deal with the single cell ATAC-seq FASTQ data, Partek Flow has wrapped the 'cellranger-atac count' pipeline from Cell Ranger ATAC v2.0[1]. It takes FASTQ files and performs multiple analysis simultaneously including reads filtering and alignment, barcode counting, identification of transposase cut sites, peak and cell calling, and generates the count matrix.
To run Cell Ranger - ATAC task:
Click the Unaligned reads data node
Select Cell Ranger - ATAC in the 10x Genomics section in the task menu on the right
Select Single cell ATAC in Assay type for ATAC-Seq data only
Choose the proper Reference assembly for the data (you may have to create the reference)
Press the Finish button to run the task with default settings (Figure 4)
The output of the count matrix then becomes the starting point for downstream analysis for scATAC-seq data in Flow (Figure 5).
An important step in analyzing single cell ATAC data is to filter out low quality cells. A few examples of low-quality cells are doublets, cells with a low TSS enrichment score, cells with a high proportion of reads mapping to the genomic blacklist regions, or cells with too few reads to be analyzed. Users are able to do this in Partek Flow using the Single cell QA/QC task.
Click on the Single cell counts node
Click on the QA/QC section in the task menu
Click on Single cell QA/QC
A task node, Single cell QA/QC, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Single cell QA/QC task node to indicate that the task is running (Figure 5).
Click the Single cell QA/QC node once it finishes running
Click Task report in the task menu
The Single cell QA/QC report includes interactive violin plots showing the value of every cell in the project on several quality measures (Figure 6).
There are five plots: Nucleosome signal, TSS enrichment, % reads in peaks, Blacklist ratio, and Peak region fragments. Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric. Cells can be filtered either by clicking and dragging to select a region on one of the plots or by setting thresholds using the filters below the plots. Here, we will apply a filter for the number of read counts. The plot will be shaded to reflect the filter. Cells that are excluded will be shown as black dots on both plots.
Descriptions of QC metrics:
Nucleosome signal: calculated per single cell, which quantifies the approximate ratio of mononucleosomal to nucleosome-free fragments. The histogram of DNA fragment sizes (determined from the paired-end sequencing reads) should exhibit a strong nucleosome banding pattern which corresponds to the length of DNA wrapped around a single nucleosome.
Peak region fragments: total number of fragments in peaks which is a measure of cellular sequencing depth/complexity. Cells with very few reads may need to be excluded due to low sequencing depth. Cells with extremely high levels may represent doublets, nuclei clumps, or other artifacts.
% reads in peaks: Represents the fraction of all fragments that fall within ATAC-seq peaks. Cells with low values (i.e. <15-20%) often represent low-quality cells or technical artifacts that should be removed. Note that this value can be sensitive to the set of peaks used.
Blacklist ratio: The ENCODE project has provided a list of blacklist regions, representing reads which are often associated with artifactual signals. Cells with a high proportion of reads mapping to these areas (compared to reads mapping to peaks) often represent technical artifacts and should be removed.
To filter out low quality cells (Figure 7),
Open the Select & Filter menu
Set the filters on nucleosome signal < 4; Peak region fragment 500-30000; leave the rest as they are
Another common task is to filter the data to include only informative features. Partek Flow has a wide variety of flexible filtering options.
Filter features task can be invoked from any counts or single cell data node. Noise Reduction and Statistics Based filters take each feature and perform the specified calculation across all the cells. The filter is applied to the values in the selected data node and the output is a filtered version of the input data node.
In the task dialog, click the check box to activate one or more of the filter types, configure the filter(s), and click Finish to run (Figure 8).
To understand the importance of enriched regions in regulating gene expression, Flow uses Annotate regions task to add information about overlapping or nearby genomic features. That gives regulatory context for enriched regions.
The input for Annotate peaks is a Peaks type data node.
Click the Filtered features data node
Click the Peak analysis section in the toolbox
Click Annotate regions
Set the Genomic overlaps parameter
The Genomics overlaps parameter lets you choose one of two options (Figure 9).
Report one gene region per peak (precedence applies) chooses one gene section for each peak using the precedence order to settle cases where more than one gene section overlaps a peak. The order of precedence is TSS, TTS, CDS Exon, 5' UTR Exon, 3' UTR Exon, Intron, Intergenic.
Report all gene regions per peak creates a row for each gene section that overlaps a peak in the task report.
Users are able to define the transcription start site (TSS) and transcription termination site (TTS) limit in the unit of bp.
Choose a gene/feature annotation from the drop-down menu
Click Finish to run
TF-IDF normalization in Flow can be invoked in Normalization and scaling section by clicking any single cell counts data node (Figure 10).
To run TF-IDF normalization,
Click a Single cell counts data node, in this case the Annotated regions node
Click the Normalization and scaling section in the toolbox
Click TF-IDF normalization
The output of TF-IDF normalization is a new data node that has been normalized by log(TF x IDF).
Singular value decomposition (SVD) will be applied to TF-IDF output in scATAC-Seq data. It returns a reduced dimension representation of a matrix. Although SVD and Principal components analysis (PCA) are two different techniques, the SVD has a close connection to PCA. Because PCA is simply an application of the SVD. For users who are more familiar with scRNA-Seq, you can think of SVD as analogous to the output of PCA. And similarly, the statistical interpretation of singular values is in the form of variance in the data explained by the various components.
To run SVD task,
Click a Normalized counts data node
Click the Exploratory analysis section in the toolbox
Click SVD
The GUI is simple and easy to understand. The SVD dialog is only asking to select the number of singular values to compute (Figure 11). By default 100 singular values will be computed if users don't want to compute all of them. However, the number could be adjusted manually or typed in directly. Simply click the Finish button if you want to run the task as default.
The task report for SVD is similar to PCA_._ Its output will be used for downstream analysis and visualization, including Harmony and WNN.
Graph-based clustering (Figure 12) identifies groups of similar cells using SVD values as the input. By including the informative SVDs, noise in the data set is excluded, improving the results of clustering.
Click the SVD output
Click Exploratory analysis in the task menu
Click Graph-based clustering
Check Compute biomarkers
Click Finish to run as default
A new Graph-based clusters data and a Biomarkers data node will be generated.
Double-click the Graph-based clusters node to see the cluster results and statistics (Figure 13)
Double-click the Biomarkers node to see the computed biomarkers if you have selected this option (Figure 14)
The Graph-based clustering result (Figure 13) lists the Total number of clusters and what proportion of cells fall into each cluster as well as Maximum modularity which is a measurement of the quality of the clustering result where optimal modularity is 1. The Biomarkers report (Figure 14) includes the top features for each graph-based cluster. It displays the top-10 genes that distinguish each cluster from the others. Download at the bottom right of the table can be used to view and save more features. These are calculated using an ANOVA test comparing the cells in each group to all the other cells, filtering to genes that are 1.5 fold upregulated, and sorting by ascending p-value. This ensures that the top-10 genes of each cluster are highly and disproportionately expressed in that cluster.
Similar to t-SNE, Uniform Manifold Approximation and Projection (UMAP) is a dimensional reduction technique. UMAP aims to preserve the essential high-dimensional structure and present it in a low-dimensional representation. UMAP is particularly useful for visually identifying groups of similar samples or cells in large high-dimensional data sets.
To run UMAP (Figure 15):
Click the SVD data node
Click the Exploratory analysis section of the toolbox
Click UMAP
Click Finish to run with default settings
UMAP produces a UMAP task node. Opening the task report launches a scatter plot showing the UMAP results. Each point on the plot is a cell for single cell data. The plot will open in 2D or 3D depending on the user preference.
The Annotate regions task in Flow labels individual peaks as promoters for a particular gene if the peak falls 1000 bases upstream from a gene's transcription start site, or 100 bases downstream from a gene's transcription start site by default (Figure 9). A promoter sum for a given gene is the number of cut sites per cell that fall within all the peaks labeled as promoters (-1000bp ~ 100bp by default or user defined through Annotate regions) for that gene. Higher promoter sum values indicate higher chromatin accessibility in the promoter region [4].
Flow task Promoter sum matrix summarizes each promoter sum and outputs a cell x gene matrix. In the matrix, only genes that have peaks within its promoter region have been included. In Flow Promoter sum matrix can be invoked in the Peak analysis section by clicking the Annotated regions data node (Figure 16).
To run Promoter sum matrix in Flow,
Click the Annotated regions data node
Click the Peak analysis section in the toolbox
Click Promoter sum matrix
Once the task has been finished, a new data node will be produced where the promoter sum value for each feature can be used to color UMAP/t-SNE and to determine cell type with raw data. We recommend users normalize its output prior to color the UMAP just like the scRNA-seq data.
Double-clicking the UMAP task node will open the task report in the Data Viewer.
To classify a cell, just select it then click Classify selection in the Classify tool.
For example, we can classify a cluster of cells expressing high levels of MS4A1 as B cells.
Make sure the right data source has been selected. For scATAC-seq data, it shall be the normalized counts of promoter sum values in most cases (Figure 17)
Set Color by in the Style configuration to the normalized counts node
Type MS4A1 in the search box and select it. Rotate the 3D plot if you need to see this cluster more clearly.
Draw a lasso around the cluster of MS4A1-expressing cells
Click Classify selection under Tools in the left panel
Type B cells for the Name
Click Save (Figure 18)
Repeat the above steps to finish the other cell type classifications. To be able to use the classifications in downstream tasks and visualizations, you must first apply them.
Click Apply classifications
Name the classification (e.g. Cell type)
Check the Compute biomarkers if needed
Click Run to complete the task
Once the classifications have been added to the project, one can color the UMAP/t-SNE plot by the Classification or compare the differentially expressed genes between different cell types.
To identify genes that distinguish a cell type, one can use the differential analysis tools in Partek Flow.
Click the TF-IDF normalized counts data node
Click the Differential analysis section in the toolbox
Click Hurdle model
Select the factors and interactions to include in the statistical test (Figure 19). Cell type has been selected here as an example.
Click Next
Define comparisons between factor or interaction levels (Figure 20)
Click Add comparison to add the comparison to the Comparisons table.
Click Finish to run the statistical test as default
A filtered Feature list data node can be produced by running the Differential analysis filter in the Hurdle model task report (Figure 21) .
Differential expression analysis can be used to compare cell types. Here, we will compare glioma and oligodendrocyte cells to identify genes differentially regulated in glioma cells from the oligodendroglioma subtype. Glioma cells in oligodendroglioma are thought to originate from oligodendrocytes, thus directly comparing the two cell types will identify genes that distinguish them.
To analyze only the oligodendroglioma subtype, we can filter the samples.
Click the Filtered counts data node
Expand Filtering in the task menu
Click Filter cells (Figure 1)
The filter lets us include or exclude samples based on sample ID and attribute.
Set the filter to Include samples where Subtype is Oligodendroglioma
Click AND
Set the second filter to exclude Cell type (multi-sample) is Microglia
Click Finish to apply the filter (Figure 2)
A Filtered counts data node will be created with only cells that are from oligodendroglioma samples (Figure 3).
Click the new Filtered counts data node
Click Statistics > Differential analysis in the task menu
Click GSA
The configuration options (Figure 4) includes sample and cell-level attributes. Here, we want to compare different cell types so we will include Cell type (multi-sample).
Click Cell type (multi-sample)
Click Next
Next, we will set up a comparison between glioma and oligodendrocyte cells.
Click Glioma in the top panel
Click Oligodendrocytes in the bottom panel
Click Add comparison (Figure 5)
This will set up fold calculations with glioma as the numerator and oligodendrocytes as the denominator.
Click Finish to run the GSA
A green GSA data node will be generated containing the results of the GSA.
Double-click the green GSA data node to open the GSA report
Because of the large number of cells and large differences between cell types, the p-values and FDR step up values are very low for highly significant genes. We can use the volcano plot to preview the effect of applying different significance thresholds.
Open the Style icon on the left, change Size point size to 6
Open the Axes icon on the left and change the Y-axis to FDR step up (Glioma vs Oligodendrocytes)
Open the Statistics icon and change the Significance of X threshold to -10 and 10 and the Y threshold to 0.001
Open the Select & Filter icon, set the Fold change thresholds to -10 and 10
Note these changes in the icon settings and volcano plot below (Figure 6).
We can now recreate these conditions in the GSA report filter.
Click GSA report tab in your web browser to return to the GSA report
Click FDR step up
Set the FDR step up filter to Less than or equal to 0.001
Press Enter
Click Fold change
Set the Fold change filter to From -10 to 10
Press Enter
The filter should include 291 genes.
To visualize the results, we can generate a hierarchical clustering heatmap.
Click the Filtered feature list produced by the Differential analysis filter task
Click Exploratory analysis in the task menu
Click Hierarchical clustering/heatmap
Using the hierarchical clustering options we can choose to include only cells from certain samples. We can also choose the order of cells on the heatmap instead of clustering. Here, we will include only glioma cells and order the samples by sample name (Figure 7).
Make sure Cluster is unchecked for Cell order
Click Filter cells under Filtering and set the filter to include Cell type (multi-sample) is Glioma
Choose Sample name from the Cell order drop-down menu in the Assign order section
Click Finish
Double click the green Hierarchical clustering node to open the heatmap
The heatmap differences may be hard to distinguish at first; the range from red to blue with a white midpoint is set very wide because of a few outlier cells. We can adjust the range to make more subtle differences visible. We can also adjust the color.
Set the Range toggle Min to -1.5
Set the Range toggle Max to 1.5
The heatmap now shows clear patterns of red and blue.
Click Axis titles and deselect the Row labels and Column labels of the panel to hide sample and feature names, respectively.
Select Sample name from the Annotations drop-down menu
Cells are now labeled with their sample name. Interestingly, samples show characteristic patterns of expression for these genes (Figure 8).
Click Glioma (multi-sample) to return to the Analyses tab.
We can use gene set enrichment to further characterize the differences between glioma and oligodendrocyte cells.
Click the Filtered feature list node
Click Biological interpretation in the task menu
Click Gene set enrichment
Change Database to Gene set database and click Finish to continue with the most recent gene set (Figure 9)
A Gene set enrichment node will be added to the pipeline .
Double-click the Gene set enrichment task node to open the task report
Top GO terms in the enrichment report include "ensheathment of neurons" and "axon ensheathment" (Figure 10), which corresponds well with the role of oligodendrocytes in creating the myelin sheath that supports and protect axons in the central nervous system.
t-SNE (t-distributed stochastic neighbor embedding) is a visualization method commonly used to analyze single-cell RNA-Seq data. Each cell is shown as a point on the plot and each cell is positioned so that it is close to cells with similar overall gene expression. When working with multiple samples, a t-SNE plot can be drawn for each sample or all samples can be combined into a single plot. Viewing samples individually is the default in Partek Flow because sample to sample variation and outlier samples can obscure cell type differences if all samples are plotted together. However, as you will see in this tutorial, in some data sets, cell type differences can be visualized even when samples are combined.
Using the t-SNE plot, cells can be classified based on clustering results and differences in expression of key marker genes.
Prior to performing t-SNE, it is a good idea to reduce the dimensionality of the data using principal components analysis (PCA).
Click the Filtered counts data node after the Filter features task
Select PCA from the Exploratory analysis section of the task menu (Figure 1)
Click Finish to run PCA with default settings (Figure 2)
Note, the default settings include the Split by sample checkbox being selected. This means that the dimensionality reduction will be performed on each sample separately.
PCA task and data nodes will be generated.
Click the PCA data node
Select t-SNE from the Exploratory analysis section of the task menu (Figure 3)
Click Finish from the t-SNE dialog to run t-SNE with the default settings (Figure 4)
Because the upstream PCA task was performed separately for each sample, the t-SNE task will also be performed separately for each sample. t-SNE task and data nodes will be generated (Figure 5).
Once the t-SNE task has completed, we can view the t-SNE plots
Click the t-SNE node
Click Task report from the task menu or double click the t-SNE node
The t-SNE will open in a new data viewer session. The t-SNE plot for the first sample in the data set, MGH36 (Figure 6), will open on the canvas. Please note that the appearance of the t-SNE plot may differ each time it is drawn so your t-SNE plots may look different than those shown in this tutorial. However, the cell-to-cell relationships indicated will be the same.
The t-SNE plot is in 3D by default. To change the default, click your avatar in the top right > Settings > My Preferences and edit your graphics preferences and change the default scatter plot format from 3D to 2D.
You can rotate the 3D plot by left-clicking and dragging your mouse. You can zoom in and out using your mouse wheel. The 2D t-SNE is also calculated and you can switch between the 2D and 3D plots on the canvas. We will do this later on in the tutorial.
Each sample has its own plot. We can switch between samples.
Open the Axes icon on the left under Configure (Figure 7)
Navigate to Misc
The t-SNE plot has switched to show the next sample, MGH42 (Figure 7).
The goal of this analysis is to compare malignant cells from two different glioma subtypes, astrocytoma and oligodendroglioma. To do this, we need to identify the malignant cells we want to include and which cells are the normal cells we want to exclude.
The t-SNE plot in Partek Flow offers several options for identifying, selecting, and classifying cells. In this tutorial, we will use the expression of known marker genes to identify cell types.
To visualize the expression of a marker gene, we can color cells on the t-SNE plot by their expression level.
Select any of the count data nodes from Get data on the left (Single cell counts, or any of the Filtered counts, Figure 8)
Search for the BCAN gene
Click and drag the BCAN gene onto the plot and drop it over the Green (feature) option
The cells will be colored from black to green based on their expression level of BCAN, with cells expressing higher levels more green (Figure 9). BCAN is highly expressed in glioma cells.
In Partek Flow, we can color cells by more than one gene. We will now add a second glioma marker gene, GPM6A.
Select any of the count data nodes from the Data card on the left (Single cell counts, or any of the Filtered counts)
Search for the GPM6A gene
Click and drag the GPM6A gene onto the plot and drop it over the Red (feature) option
Cells expressing GPM6A are now colored red and cells expressing BCAN are colored green. Cells expressing both genes are colored yellow, while cells expressing neither are colored black (Figure 10).
Numerical expression levels for each gene can be viewed for individual cells.
Select a cell by pointing and clicking
The expression level for that cell is displayed on the legend for each gene. Expression values can also be viewed by mousing over a cell (Figure 11).
Deselect the cell by clicking on any blank space on the plot
Now that cells are colored by the expression of two glioma cell markers, we can classify any cell that expresses these genes as glioma cells. Because t-SNE groups cells that are similar across the high-dimensional gene expression data, we will consider cells that form a group where the majority of cells express BCAN and/or GPM6A as the same cell type, even if they do not express either marker gene.
Draw the lasso around the cluster of green, red, and yellow cells and click the circle to close the lasso (Figure 12)
Selected cells are shown in bold and unselected cells are dimmed. The number of selected cells is indicated in the figure legend. The cells are plotted on the color scale depending on their relative expression levels of the two marker genes (Figure 13)
Click Classify selection in the Classify icon under Tools
A dialog to give the classification a name will appear.
Name the classification Glioma
Click Save (Figure 14)
Once cells have been classified, the classification is added to Classify. The number of cells belonging to the classification is listed. In MGH42, there are 460 glioma cells (Figure 15).
Deselect the cells by clicking on any blank space on the plot
Open Axes and navigate to Sample under Misc
Rotate the 3D t-SNE plot to get a better view of cells from the green, red, and yellow cluster
Draw the lasso around the cluster of colored cells and click the circle to close the lasso (Figure 16)
Select Classify selection in the Classify icon
Type Glioma or select Glioma from the drop-down list (Figure 17)
Click Save
Repeat these steps for each of the 6 remaining samples. Remember to go back to the first sample (MGH36) to classify the glioma cells in that samples too.
There should be 5,322 glioma cells in total across all 8 samples.
The classification name can be edited or deleted (Figure 18).
With the malignant cells in every sample classified, it is time to save the classifications.
Click Apply classifications in the Classify icon
Name the classification attribute Cell type (sample level)
Click Run (Figure 19)
The new attribute is stored in the Data tab and is available to any node in the project.
Click on the Glioma (multi-sample) project name at the top to go back to the Analyses tab
Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab
For some data sets, cell types can be distinguished when all samples can be visualized together on one t-SNE plot. We will use a t-SNE plot of all samples to classify glioma, microglia, and oligodendrocyte cell types.
Click on the Glioma (multi-sample) project name at the top to go back to the Analyses tab
Click the Filtered counts data node after the Filter features task
Click PCA in the Exploratory analysis section of the task menu
Uncheck the Split by sample checkbox (Figure 22)
Click Finish
The PCA task will run as a new green layer.
Click the new PCA data node
Select t-SNE from the Exploratory analysis section of the task menu
Click Finish to run the t-SNE task with default settings
The t-SNE task will be added to the green layer (Figure 23). Layers are created in Partek Flow when the same task is run on the same data node.
Once the task has completed, we can view the plot.
Double-click the green t-SNE data node to open the t-SNE scatter plot
Click and drag the 2D scatter plot icon onto the canvas and replace the 3D scatter plot (Figure 24)
Search for and select green t-SNE data node (Figure 25)
In the Style icon, choose Sample name from the Color by drop-down list under Color
Viewing the 2D t-SNE plot, while most cells cluster by sample, there are a few clusters with cells from multiple samples (Figure 26).
Using marker genes, BCAN (glioma), CD14 (microglia), and MAG (oligodendrocytes), we can assess whether these multi-sample clusters belong to our known cell types.
Select any of the count data nodes from the Data card on the left (Single cell counts, or any of the Filtered counts)
Search for the BCAN gene
Click and drag the BCAN gene onto the plot and drop it over the Green (feature) option
Search for the CD14 gene
Click and drag the CD14 gene onto the plot and drop it over the Red (feature) option
Search for the MAG gene
Click and drag the MAG gene onto the plot and drop it over the Blue (feature) option
After coloring by these marker genes, three cell populations are clearly visible (Figure 27).
The red cells are CD14 positive, indicating that they are the microglia from every sample.
Draw the lasso around the cluster of red cells and click the circle to close the lasso (Figure 28)
Open the Classify tool and click Classify selection
Name the classification Microglia
Click Save
The blue cells are MAG positive, indicating that they are the oligodendrocytes from every sample.
Deselect the cells by clicking on any blank space on the plot
Draw the lasso around the cluster of blue cells and click the circle to close the lasso
Open the Classify tool and click Classify selection
Name the classification Oligodendrocytes
Click Save
Finally, we will classify the BCAN expressing cells on the plot as glioma cells from every sample.
Deselect the cells by clicking on any blank space on the plot
Draw the lasso around the cluster of green cells and click the circle to close the lasso
Open the Classify tool and click Classify selection
Name the classification Glioma
Click Save
Deselect the cells by clicking on any blank space on the plot
The number of cells classified as microglia, oligodendrocytes, and glioma are shown in Classify (Figure 29)
Click Apply classifications in the Classify icon (Figure 30)
Name the classification attribute Cell type (multi-sample) (Figure 31)
Click Run
The new attribute is now available for downstream analysis.
Click on the Glioma (multi-sample) project name at the top to go back to the Analyses tab
Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab
Click in the CD45RA_TotalSeqB row
Click in the top right corner of the table to open a volcano plot
The plot can be configured using various tools on the left. For example, the Style icon can be used to change the appearance of the points. The X and Y-axes can be changed in the Axes icon. The Statistics icon can be used to set different Fold-change and P-value thresholds for coloring up/down-regulated genes. The in plot controls can be used to transpose the volcano plot (Figure 14).
Click to create a new data node including only these significantly different genes
On the first 2D scatter plot (with protein markers), click in the top right corner
Click in the top right of the plot to switch back to pointer mode
On the second 2D scatter plot (with mRNA markers), click in the top right corner
Click in the top right corner of both 2D scatter plots, to remove them from the canvas
Click in the top right corner of the 3D UMAP plot
Click in the Select & Filter tool to include the selected points
Click in the top right of the plot to switch back to pointer mode
Add the Biomarkers table using the Table option in the New plot menu, you can drag and reposition the table using the button in the top left corner of the plot .
If you need to create more space on the canvas, hide the panel words on the left using the arrow .
In Select & Filter, click to remove the CD3_TotalSeqB filtering rule
Click in Select & Filter to exclude the cluster 6/Tfh cells
Click in Select & Filter to exclude the cluster 4/Cytotoxic cells
Click in the top right corner of the UMAP plot
Click in Select & Filter to include the selected points
Click in Select & Filter to exclude the selected points
Click in the top right corner of the UMAP plot
Optionally, you may wish to save this data viewer session if you need to go back and reclassify cells later. To save the session, click the icon on the left and name the session.
Click the blue + Add sample button then use the green Add sample button to add each sample's Xenium output bundle folder. If you have not already transferred the folder to the server, this can be done using Transfer files to the server (Figure 3).
Once uploaded the folder to the server, navigate to the appropriate folder for each sample using Add sample (Figure 5).
Select cell_type from the drop-down and click the green Add button
Click the blue circle node to the right of the Color by drop-down
Download this table with more than 10 features using the Download option
We will create a list using with these 10 genes, so that we can use this list in the Gene set enrichment task.
Here we are going to perform on our top 10 features for the FASN high group that we have added as a list called "Top 10 FASN high Features".
Please click for more information on Biological interpretation.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The sample table is pre-populated with two sample attributes: # Cells and Subtype. Sample attributes can be added and edited manually by clicking Manage in the Sample attributes menu on the left. If a new attribute is added, click Assign values to assign samples to different groups. Alternatively, you can use the Assign values from a file option to assign sample attributes using a tab-delimited text file. For more information about sample attributes, see .
Click under Filter to include the selected cells
For more information on normalizing data in Partek Flow, please see the section of the user manual.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you are new to Partek Flow, please see for information about data transfer and import and for information about the Partek Flow user interface.
This tutorial uses a if you would like to follow along exactly.
To learn more about how to run task in Flow, please refer to our online .
TSS enrichment: Transcriptional start site (TSS) enrichment score. The ENCODE project has defined an ATAC-seq targeting score based on the ratio of fragments centered at the TSS to fragments in TSS-flanking regions (see ). Poor ATAC-seq experiments typically will have a low TSS enrichment score.
Click the filter icon and Apply observation filter to run the Filter cells task on the first Single cell ATAC counts data node, it generates a Filtered cells node
Latent semantic indexing (LSI) was first introduced for the analysis of scATAC-seq data by [2]. LSI combines steps of frequency-inverse document frequency (TF-IDF) normalization followed by singular value decomposition (SVD). Partek Flow wrapped Signac's TF-IDF normalization for single cell ATAC-seq dataset. It is a two-step normalization procedure that both normalizes across cells to correct for differences in cellular sequencing depth, and across peaks to give higher values to more rare peaks[3].
Click to activate Lasso mode
Hurdle model produces a Feature list task node. The results table and options are the same as the task report except the last two columns. The percentage of cells where the feature is detected (value is above the background threshold) in different groups (Pct(group1), Pct(group2)) are calculated and included in the Hurdle model report.
Once we have filtered a list of differentially expressed genes, we can visualize these genes by generating a , or perform the Gene set enrichment analysis and .
For information about automating steps in this analysis workflow, please see our documentation page on .
Cusanovich, D., Reddington, J., Garfield, D. et al. The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 555, 538–542 (2018).
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Click to view the Volcano plot
In Select & Filter, click to remove the P-value (Glioma vs Oligodendrocytes) selection rule. From the drop-down list, add FDR step up (Glioma vs Oligodendrocytes) as a selection rule and set the maximum to 0.001
Click to apply the filter and generate a Filtered Feature list node
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select the icon below the Sample name to go to the next sample
Switch to pointer mode by clicking in the top right corner of the plot
Switch to lasso mode by clicking in the top right of the plot
Classifications made on the t-SNE plot are retained as a draft as part of the data viewer session. In this tutorial, we will classify malignant cells for each sample before we save and apply the classifications, but if necessary, you can save the data viewer session by clicking the Save icon on the left to retain all of the formatting and draft classifications. The data viewer session will be stored under the Data viewer tab and can be re-opened to continue making classifications at a later time.
Switch to pointer mode by clicking in the top right corner of the plot
Select the icon below the sample name to go to the next sample, MGH45
Switch to lasso mode by selecting in the top right corner of the plot
Switch to lasso mode by clicking the icon in the top right of the plot
Switch to pointer mode by clicking in the top right corner of the plot
Switch to lasso mode again by clicking the icon in the top right of the plot
Switch to pointer mode by clicking in the top right corner of the plot
Switch to lasso mode again by clicking the icon in the top right of the plot
Switch to pointer mode by clicking in the top right corner of the plot
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.