Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
When a feature (gene) has low expression, it will be filtered by automatic independent filtering. To avoid this, you can either filter features to exclude low expression features before Deseq2, or in the Deseq2 advanced options, choose apply independent filtering setting. Details about independent filtering can be found at the Deseq2 documentation.
Click here for troubleshooting other differential analysis models and "?" results
Fold change indicates the extent of increase or decrease in feature expression in a comparison. In Partek Flow, fold change is in linear scale (even if the input data is in log scale). It is converted from ratio, which is the LSmean of group one divided by LSmean of group two in your comparison. When the ratio is greater than 1, fold change is identical to ratio; when the ratio is less than 1, fold change is -1/ratio. There is no fold change value between -1 to 1. When ratio/fold change is 1, that means there is no change between the two groups.
Log ratio option in Partek Flow is converted from ratio, this is a value comparable to log fold change in some other tools.
Yes, go to Style in the Data Viewer and make sure Gene name is selected under "Labeling". Next, go to the in plot selection tools (right side of the graphic) and use any of the selection tools to select the cells that you would like to label. You can use ctrl or shift to select multiple populations at once. For more information on the Volcano plot click here.
By default, Flow is using the p value <= 0.05 and |fold change|>=2 as the significance cutoff. If genes meet both p value and fold change cutoff, they are significantly up or down regulated genes. If they only meet one criteria, they are called inconclusive. If genes won't pass either criteria, they are not significant. Click on the Statistics button in the Configure section in the left control panel, you can change the cutoff. Click on the Style button to change the color of significance categories.
FDR is the expected proportion of false discoveries among all discoveries. FDR Step-up is a particular method to keep FDR under a given level, alpha, that was proposed in this paper. In Partek Flow, if one calls all of the features with p-values 0.02 or less, the FDR is less or equal to 0.41.
You should have at least the following two attributes in the Metadata, treatment (including two subgroups) and subject ID (to pair the two samples). When performing differential analysis, choose ANOVA and include both attributes into the ANOVA model, the two-way ANOVA is mathematically equivalent to paired t-Test.
Yes, you can use the Compute biomarkers task to compare one subgroup at a time to all of the others combined. An alternative option is to set up the differential analysis model in this way; for more information please see the information here for each model.
In the Quantifying to an annotation model dialog, by default, Partek Flow filters features based on the total count across all of the samples and features with a total count greater than 10 will be reported. If you want to report all of the genes in the annotation file, change the Filter features value to 0.
Partek software enables researchers to easily perform genomic data analysis without ever needing to write a single line of code or sacrificing statistical power or advanced functionality. From alignment to pathway analysis, Partek provides a seamless, integrated analysis solution on a single platform that provides the power of a cloud or cluster when needed, and the convenience of desktop software for less compute intensive tasks.
Here you will find documentation on how to use and administer our products.
Partek Flow is a web-based application for genomic data analysis and visualization. It can be installed on a desktop computer, computer cluster or cloud. Users can then access Partek Flow from any browser-enabled device, such as a personal computer, tablet or smartphone.
Read on to learn about the following installation topics:
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
To create a project, you first need to , and then import the files into your project using the import data wizard, here is the and .
Yes, navigate to and click the "Change image" button. Do this by clicking your avatar at the top right corner of the interface, select , then choose .
Click your avatar in the top right corner of the Partek Flow interface, choose in the menu, and select from the left panel of the section. Lists can also be generated from result tables using the "Save as managed list" button. For more information please click .
Yes, click on the rectangular task that you want to change the parameters. On the context-specific menu on the right, under Task actions, select ‘Rerun with downstream tasks’, this will bring you to the task set up page where you can edit the parameters for the task, then click Finish to run the task with the new parameters. The tasks downstream of it will be initiated automatically.
Use to identify cells with active gene sets; this task calculates a value for each cell by ranking all genes by their expression level in the cell and identifying what proportion of the genes from the gene list fall within the top 5% (default cutoff) of genes. An alternative option is to use the Gene score for a feature list to select and filter populations based on the distribution; .
Yes, click on on the bottom of the tab dashboard. This will help you import either our hosted pipelines or your own saved pipeline which can be found under -> -> . Click for steps to save and run a pipeline. For more information related to navigating pipelines click .
Classification in Partek Flow can be performed manually or with automatic cell classification which is explained in more detail . Users often want to classify cells by gene expression threshold(s), for details on classification by marker expression click . Automatic classification needs to be performed on a non-normalized single cell data node; once complete, then use this classification in visualizations and tasks. You may choose to perform and to help identify biomarkers that can then be used to identify the clusters and we also provide hosted lists for different cell types.
The Annotation model is a file containing feature location. This file can be used to quantify to annotation model in RNA-Seq analysis, or annotate variant or peaks in a DNA-Seq or ATAC-Seq/ChIP-Seq data analysis pipeline. The file format should be .gtf/.gff/.bed.
Genome coordinates for annotation models stored in Partek Flow are 1-based, start-inclusive, and stop-exclusive. This means that the first base position starts from one, the start coordinate for a feature is included in the feature and the stop/end coordinate is not included in the feature. These are the genome coordinates that are printed in various task reports and output files when an annotation model is involved in the task. When custom annotation files are added to Partek Flow, the genome coordinates are converted into this format. The coordinates are converted back if necessary for a specific task. shows how the genome coordinates vary between different annotation formats.
Yes, to add transgenes (including gfp or related) to the references files, first choose an assembly, create the transgene reference, and merge the references together (e.g. combine mm10 with dttomato). This is the same process for the annotation file.
RPM (reads per million) is the same as Total Count. Please use Total Count.
For genes with multiple transcripts, one of the transcripts is picked as the canonical transcript. Based on the UCSC definition from the table browser,
knownCanonical - identifies the canonical isoform of each cluster ID, or gene. Generally, this is the longest isoform.
we define the canonical transcript as either the longest CDS (coding DNA sequence) if the gene has translated transcripts, or the longest cDNA.
The Partek E/M quantification algorithm can give decimal values because of multi-mapping reads (the same read potentially aligning to multiple locations) and overlapping transcripts/genes (a read that maps to a location with multiple transcripts or genes at that location). In these scenarios, the read count will be split.
For example, if a read maps to two potential locations, then that read contributes 0.5 counts to the first location and 0.5 counts to the second location. Similarly, if a read maps to one location with two overlapping genes, then that read contributes 0.5 counts to the first gene and 0.5 counts to the second gene.
If you need to remove the decimal points for downstream analysis outside of Partek Flow, you can round the values to the nearest integer.
For variants with multiple alternative alleles, the variant has one row for all alternative alleles, while the summarize cohort mutations report lists each alternative allele on a separate rows. The number of variants listed at the top of the each report is calculated from the number of rows in the report.
To create a project, you first need to , and then import the files into your project using the import data wizard, here is the and .
Yes, navigate to and click the "Change image" button. Do this by clicking your avatar at the top right corner of the interface, select , then choose .
Click your avatar in the top right corner of the Partek Flow interface, choose in the menu, and select from the left panel of the section. Lists can also be generated from result tables using the "Save as managed list" button. For more information please click .
Yes, click on the rectangular task that you want to change the parameters. On the context-specific menu on the right, under Task actions, select ‘Rerun with downstream tasks’, this will bring you to the task set up page where you can edit the parameters for the task, then click Finish to run the task with the new parameters. The tasks downstream of it will be initiated automatically.
Use to identify cells with active gene sets; this task calculates a value for each cell by ranking all genes by their expression level in the cell and identifying what proportion of the genes from the gene list fall within the top 5% (default cutoff) of genes. An alternative option is to use the Gene score for a feature list to select and filter populations based on the distribution; .
The Annotation model is a file containing feature location. This file can be used to quantify to annotation model in RNA-Seq analysis, or annotate variant or peaks in a DNA-Seq or ATAC-Seq/ChIP-Seq data analysis pipeline. The file format should be .gtf/.gff/.bed.
Genome coordinates for annotation models stored in Partek Flow are 1-based, start-inclusive, and stop-exclusive. This means that the first base position starts from one, the start coordinate for a feature is included in the feature and the stop/end coordinate is not included in the feature. These are the genome coordinates that are printed in various task reports and output files when an annotation model is involved in the task. When custom annotation files are added to Partek Flow, the genome coordinates are converted into this format. The coordinates are converted back if necessary for a specific task. shows how the genome coordinates vary between different annotation formats.
Yes, to add transgenes (including gfp or related) to the references files, first choose an assembly, create the transgene reference, and merge the references together (e.g. combine mm10 with dttomato). This is the same process for the annotation file.
RPM (reads per million) is the same as Total Count. Please use Total Count.
For genes with multiple transcripts, one of the transcripts is picked as the canonical transcript. Based on the UCSC definition from the table browser,
knownCanonical - identifies the canonical isoform of each cluster ID, or gene. Generally, this is the longest isoform.
we define the canonical transcript as either the longest CDS (coding DNA sequence) if the gene has translated transcripts, or the longest cDNA.
The Partek E/M quantification algorithm can give decimal values because of multi-mapping reads (the same read potentially aligning to multiple locations) and overlapping transcripts/genes (a read that maps to a location with multiple transcripts or genes at that location). In these scenarios, the read count will be split.
For example, if a read maps to two potential locations, then that read contributes 0.5 counts to the first location and 0.5 counts to the second location. Similarly, if a read maps to one location with two overlapping genes, then that read contributes 0.5 counts to the first gene and 0.5 counts to the second gene.
If you need to remove the decimal points for downstream analysis outside of Partek Flow, you can round the values to the nearest integer.
For variants with multiple alternative alleles, the variant has one row for all alternative alleles, while the summarize cohort mutations report lists each alternative allele on a separate rows. The number of variants listed at the top of the each report is calculated from the number of rows in the report.
For a multi-sample project, all of the downstream tasks will be run separately if 'Split by sample' was checked when performing the PCA task. Visualization of different samples can be displayed by 'Sample' using the 'Misc' section in the Axes card. To show different samples side by side, one can click 'Duplicate plot' first, then use the 'Sample' option to switch the samples.
The Flip mode and download all data options are disabled if there are more than 2.5 million values (rows x columns) in the heatmap.
By default, genes are selected if the p-value is <=0.05 and |fold change| >=2 and when the number of selected genes is less than 2000 genes, they will be labeled. You can click on Style button in Configure section, choose a gene annotation field from the Label by drop-down list to change the label. If you number of selected genes is select less than or equal to 100, Partek Flow will try to spread out labels as much as possible to clearly display the labels. If number of selected genes is more than 100, labels will be next to the selected genes, there will be overlaps where genes are close together. If there are more than 2000 genes selected, no label will be displayed.
If you click any blank space, you can turn off select and use different selection mode button on the vertical bar on the upper-right corner of the plot to manually select dots on the plot.
Fold change indicates the extent of increase or decrease in feature expression in a comparison. In Partek Flow, fold change is in linear scale (even if the input data is in log scale). It is converted from ratio, which is the LSmean of group one divided by LSmean of group two in your comparison. When the ratio is greater than 1, fold change is identical to ratio; when the ratio is less than 1, fold change is -1/ratio. There is no fold change value between -1 to 1. When ratio/fold change is 1, that means there is no change between the two groups.
Log ratio option in Partek Flow is converted from ratio, this is a value comparable to log fold change in some other tools.
By default, Flow is using the p value <= 0.05 and |fold change|>=2 as the significance cutoff. If genes meet both p value and fold change cutoff, they are significantly up or down regulated genes. If they only meet one criteria, they are called inconclusive. If genes won't pass either criteria, they are not significant. Click on the Statistics button in the Configure section in the left control panel, you can change the cutoff. Click on the Style button to change the color of significance categories.
You should have at least the following two attributes in the Metadata, treatment (including two subgroups) and subject ID (to pair the two samples). When performing differential analysis, choose ANOVA and include both attributes into the ANOVA model, the two-way ANOVA is mathematically equivalent to paired t-Test.
In Partek Flow, GSEA should be performed on a sample/cell and feature matrix data node (e.g. normalization count data). GSEA is used to detect a gene set/a pathway which is significantly different between two groups. Gene set enrichment should be performed on a filtered gene list; it is used to identify overrepresented gene set/pathway based the filtered gene list using Fisher's exact test. The input data is a filtered list using gene names.
The enrichment score shown in the enrichment report is the negative natural log of the enrichment p-value derived from Fisher Exact test. The higher the enrichment score, the more overrepresented our list of genes in the gene set of a GO/pathway category.
For Gene set enrichment analysis, only genes from the input data node (filtered gene list) will be colored in the KEGG pathway gene network, using the statistics in the data node.
During GSEA (or Gene set ANOVA) computation, we also perform ANOVA on each gene based on the attributed selected independent from GESA computation (at gene set level). The results of ANOVA is only used to color the genes in the KEGG gene network. If GSEA is computed using another other database, e.g. GO, we don't compute ANOVA on each gene since GO databased doesn't have gene network information.
We recommend cleaning up projects as well as removing library files that you do not need, then removing the orphaned files. You can also export analyzed projects and save them on an external machine, then when you need them again you can import them to the server. Please see this information for more details related to: , , and . Right click on the data node to delete files from projects that are not needed (e.g. fastqs from project pipelines that are analyzed); you will not be able to perform tasks from this node once the files are deleted.
To add a new assembly, click on -> . From the Assembly drop-down list, select Add assembly and specify the species. If the species name is not in the list, choose Other and type in the name with the assembly version (multiple assembly versions can exist for one species, e.g. hg19 and hg38 for Homo Sapiens). You need to add the reference file which is a .fasta file containing sequence information. Once the reference file is added, you can build any aligner index to perform the alignment task.
We recommend looking for the species files on the website. There is no need to unzip or save these files to your local machine, instead right click and copy the link address of the specific file (not a link to a folder). For more details, here is the documentation chapter: .
Left click to select the data node you want to export. In the bottom of the task menu there will be an option to .
When working with paired data it should be the case that FPKM is available, and when working with single end data RPKM should be available. These metrics are essentially analogous, but based on the underlying method used for calculation (accounting for two reads mapping to 1 fragment and not counting twice for paired end data). Here is a simple description of the differences in calculation between RPKM and FPKM: .
Yes, click on on the bottom of the tab dashboard. This will help you import either our hosted pipelines or your own saved pipeline which can be found under -> -> . Click for steps to save and run a pipeline. For more information related to navigating pipelines click .
Classification in Partek Flow can be performed manually or with automatic cell classification which is explained in more detail . Users often want to classify cells by gene expression threshold(s), for details on classification by marker expression click . Automatic classification needs to be performed on a non-normalized single cell data node; once complete, then use this classification in visualizations and tasks. You may choose to perform and to help identify biomarkers that can then be used to identify the clusters and we also provide hosted lists for different cell types.
We recommend cleaning up projects as well as removing library files that you do not need, then removing the orphaned files. You can also export analyzed projects and save them on an external machine, then when you need them again you can import them to the server. Please see this information for more details related to: , , and . Right click on the data node to delete files from projects that are not needed (e.g. fastqs from project pipelines that are analyzed); you will not be able to perform tasks from this node once the files are deleted.
To add a new assembly, click on -> . From the Assembly drop-down list, select Add assembly and specify the species. If the species name is not in the list, choose Other and type in the name with the assembly version (multiple assembly versions can exist for one species, e.g. hg19 and hg38 for Homo Sapiens). You need to add the reference file which is a .fasta file containing sequence information. Once the reference file is added, you can build any aligner index to perform the alignment task.
We recommend looking for the species files on the website. There is no need to unzip or save these files to your local machine, instead right click and copy the link address of the specific file (not a link to a folder). For more details, here is the documentation chapter: .
Left click to select the data node you want to export. In the bottom of the task menu there will be an option to .
When working with paired data it should be the case that FPKM is available, and when working with single end data RPKM should be available. These metrics are essentially analogous, but based on the underlying method used for calculation (accounting for two reads mapping to 1 fragment and not counting twice for paired end data). Here is a simple description of the differences in calculation between RPKM and FPKM: .
If you would like specific groups (e.g. cell types) in a certain order, do not perform on these cells and instead choose to assign order, then use click and drag to reorder the groups. If you want to remove a group, you can choose to exclude this group in the filtering section. You can still perform on the features if you would like to. Hierarchical clustering will force the heatmap to cluster and you would need to click the dendrogram nodes to switch the order. Click for more information.
Yes, the default settings can be modified by clicking "Configure" in the Advanced settings during task set-up, then change the "feature scaling" option to "none" to plot the values without scaling. For more information related to to the heatmap click .
When a feature (gene) has low expression, it will be filtered by automatic independent filtering. To avoid this, you can either to exclude low expression features before Deseq2, or in the Deseq2 advanced options, choose apply setting. Details about independent filtering can be found at the .
Yes, go to Style in the and make sure Gene name is selected under "Labeling". Next, go to the in plot selection tools (right side of the graphic) and use any of the selection tools to select the cells that you would like to label. You can use ctrl or shift to select multiple populations at once. For more information on the Volcano plot click .
FDR is the expected proportion of false discoveries among all discoveries. FDR Step-up is a particular method to keep FDR under a given level, alpha, that was proposed in this . In Partek Flow, if one calls all of the features with p-values 0.02 or less, the FDR is less or equal to 0.41.
Yes, you can use the task to compare one subgroup at a time to all of the others combined. An alternative option is to set up the differential analysis model in this way; for more information please see the information here for each model.
In the dialog, by default, Partek Flow filters features based on the total count across all of the samples and features with a total count greater than 10 will be reported. If you want to report all of the genes in the annotation file, change the Filter features value to 0.
Both methods should be performed on a normalized matrix data node, and requires gene symbol in feature annotation. Both methods are detecting a differentially expressed Gene set (pathway) instead of each individual gene. The algorithms are different. GSEA is a popular method from the . Gene Set ANOVA is based on generalized linear model, are the details.
In Partek Flow, GSEA should be performed on a sample/cell and feature matrix data node (e.g. normalization count data). GSEA is used to detect a gene set/a pathway which is significantly different between two groups. Gene set enrichment should be performed on a filtered gene list; it is used to identify overrepresented gene set/pathway based the filtered gene list using Fisher's exact test. The input data is a filtered list using gene names.
The enrichment score shown in the enrichment report is the negative natural log of the enrichment p-value derived from Fisher Exact test. The higher the enrichment score, the more overrepresented our list of genes in the gene set of a GO/pathway category.
For Gene set enrichment analysis, only genes from the input data node (filtered gene list) will be colored in the KEGG pathway gene network, using the statistics in the data node.
During GSEA (or Gene set ANOVA) computation, we also perform ANOVA on each gene based on the attributed selected independent from GESA computation (at gene set level). The results of ANOVA is only used to color the genes in the KEGG gene network. If GSEA is computed using another other database, e.g. GO, we don't compute ANOVA on each gene since GO databased doesn't have gene network information.
Both methods should be performed on a normalized matrix data node, and requires gene symbol in feature annotation. Both methods are detecting a differentially expressed Gene set (pathway) instead of each individual gene. The algorithms are different. GSEA is a popular method from the Broad institute. Gene Set ANOVA is based on generalized linear model, here are the details.
If you would like specific groups (e.g. cell types) in a certain order, do not perform Hierarchical clustering on these cells and instead choose to assign order, then use click and drag to reorder the groups. If you want to remove a group, you can choose to exclude this group in the filtering section. You can still perform Hierarchical clustering on the features if you would like to. Hierarchical clustering will force the heatmap to cluster and you would need to click the dendrogram nodes to switch the order. Click here for more information.
For a multi-sample project, all of the downstream tasks will be run separately if 'Split by sample' was checked when performing the PCA task. Visualization of different samples can be displayed by 'Sample' using the 'Misc' section in the Axes card. To show different samples side by side, one can click 'Duplicate plot' first, then use the 'Sample' option to switch the samples.
Yes, the default settings can be modified by clicking "Configure" in the Advanced settings during task set-up, then change the "feature scaling" option to "none" to plot the values without scaling. For more information related to the heatmap click here.
The Flip mode and download all data options are disabled if there are more than 2.5 million values (rows x columns) in the heatmap.
By default, genes are selected if the p-value is <=0.05 and |fold change| >=2 and when the number of selected genes is less than 2000 genes, they will be labeled. You can click on Style button in Configure section, choose a gene annotation field from the Label by drop-down list to change the label. If you number of selected genes is select less than or equal to 100, Partek Flow will try to spread out labels as much as possible to clearly display the labels. If number of selected genes is more than 100, labels will be next to the selected genes, there will be overlaps where genes are close together. If there are more than 2000 genes selected, no label will be displayed.
If you click any blank space, you can turn off select and use different selection mode button on the vertical bar on the upper-right corner of the plot to manually select dots on the plot.
This guide gives the basics of Partek Flow usage. Partek Flow can be installed in either a server, computer cluster or on the cloud. Regardless of where it's installed, it can be viewed using any web browser. We recommend using Google Chrome.
This guide covers:
Logging in to your Partek Flow account will bring up the Home page (Figure 1). This page will show recent activities you've performed, recent projects you've worked on and pertinent details about each project.
Select the type of data (Single cell, Bulk, Other), choose the assay type, and select the data format. Partek Flow accepts various data types. Use the Next button to proceed with import.
There are three ways you can upload the data:
From your Partek Flow server (click here for more information)
From a URL
From a GEO / ENA Bioproject (click here for more information)
Because genomics datasets are generally large, it is ideal to have the data copied in a folder directly accessible to the Partek Flow server. Make sure that the directory has the appropriate permissions for Partek Flow to read and write files in that folder. You may wish to seek assistance from your system administrator in uploading your data directly.
Select the files you would like to create samples from. Once they've been created, assign the corresponding sample attributes for each sample using the Metadata tab. The most efficient way to assign sample attributes is by clicking Assign sample attributes from a file and uploading a tab delimited text file. The file should contain a table with the following:
The first row lists the attribute names (e.g. Treatment, Exposure) and
The first column of the table lists the sample names (the sample names in the file must be identical to the ones listed in the Sample name column in the Data tab)
List the corresponding attributes for each sample in the succeeding columns
After samples have been added and associated with valid data files, a data node will appear in the Analyses tab (Figure 3). The Analyses tab is where you can invoke tasks, using the context sensitive menu on the right, and view the results of your analysis.
To add more data use the Add data task in the menu on the right or Add data in the Metadata tab. Once a task is performed, data can no longer be added to the project.
The Analyses tab contains two elements: data nodes (circles) and task nodes (rounded rectangles) connected by lines and arrows . Collectively, they represent a data analysis pipeline (Figure 4).
Clicking a data node brings up a context sensitive menu on the right (Figure 5). This menu changes depending on the type of data node. It will only present tasks which can be performed on that specific data type. Hover over the task to obtain additional information regarding each option.
Depending on the task, a new data node may automatically be created and connected to the original data node. This contains the data resulting from the task. Tasks that do not produce new data types, such as Pre-alignment QA/QC, will not produce an additional data node.
To view the results of a task, click the task node and choose the Task report option on the menu.
Data associated with any data node can be downloaded by clicking the node and choosing Download data at the bottom of the task menu (Figure 8). Compressed files will be downloaded to the local computer where the user is accessing the Partek Flow server. Note that bigger files (such as unaligned reads) would take longer to download. For guidance, a file size estimate is provided for each data node. Downloaded files can be seamlessly imported in Partek® Genomics Suite®.
Watch a webinar of how to set-up a simple RNA-Seq project.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Ready to start work on your Partek Flow hosted trial? This page has some helpful videos to get you started!
Don't have a trial server yet? Request one on our website.
This short video shows you how to import your data into a hosted instance of Partek Flow. Adjust your device's volume for optimal sound.
Note: When upload large size of data, it might take a while, please turn off the computer sleep mode settings!
In this short video, we'll give you an overview of the interface and how to get started with your analysis.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
From the Home Page, click the New project button . Assign a name to the project and click the Create project button.
Upon creation of a new project, the Analyses tab will appear, prompting you to add samples to your project. Click the blue Add data button (Figure 2).
Select the task you wish to perform from the menu. When configuring task options, additional information regarding each option is available (Figure 6). When available, hover over Tooltips or click the video help for decision making. Click Finish to perform the task.
Click Save on any visualization to export a publication-quality image (Figure 7).
Regardless of whether Partek Flow is installed on a server or the cloud, users will be interacting with the software using a web browser. We support the latest Google Chrome, Mozilla Firefox, Microsoft Edge and Apple Safari browsers. While we make an effort to ensure that Partek Flow is robust, note that some browser plugins may affect the way the software is viewed on your browser.
If you are installing Partek Flow on your own single-node server, we require the following for successful installation:
Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8 or later versions of these distributions
64-bit 2GHz quad-core processor1
48GB of RAM2
> 2TB of storage available for data
> 100GB on the root partition
A broadband internet connection
We support Docker-based installations. Please contact support@partek.com for more information.
1Note that some analyses have higher system requirements for example to run the STAR aligner on a reference genome of size ~3 GB (such as human, mouse or rat), 16 cores are required.
2Input sample file size can also impact memory usage, which is particularly the case for TopHat alignments.
Increasing hardware resources (cores, RAM, disk space, and speed) will allow for faster processing of more samples.
If you are licensed for the Single Cell Toolkit, please see Single Cell Toolkit System Requirements for amended hardware requirements.
Please contact Partek Technical Support if you would like to install Partek Flow on your own HPC or cloud account. We will assist in assessing your hardware needs and can make recommendations regarding provisioning sufficient resources to run the software.
Proper storage planning is necessary to avoid future frustration and costly maintenance. Here are several DO's and DO NOT's:
DO:
Plan for at least 3 to 5 times more storage than you think is necessary. Investing extra funds in storage and storage performance is always worth it.
Keep all Flow data on a single partition that is expandable, such as RAID or LVM.
Back up your data, especially the Partek Flow database.
DO NOT:
Store data on 'removable' USB drives. Partek Flow will not be able to see these drives.
Store data across multiple partitions or folder locations. This will increase the maintenance burden substantially.
Use non-Linux file systems like NTFS.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document describes how to set up and configure a single-node Partek Flow license.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Because of the large size of single cell RNA-Seq data sets and the computationally-intensive tools used in single cell analysis, we have amended our system requirements and recommendations for installations of Partek Flow with the Single Cell toolkit.
Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 64 GB of RAM
Local scratch space*: 1 TB with cached or native speeds of 2GB/s or higher
Storage: > 2 TB available for data and > 100 GB on the root partition
Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 128 GB of RAM
Local scratch space1: 2 TB with cached or native speeds of 2GB/s or higher
Storage: > 2 TB available for data and > 100 GB on the root partition
Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 256 GB of RAM
Local scratch space1: 2 TB with cached or native speeds of 2GB/s or higher
Storage: > 4 TB available for data
Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 512 GB of RAM
Local scratch space1: 10 TB with cached or native speeds of 2GB/s or higher
Storage: 10 TB available for data
For fastest performance:
Newer generation CPU cores with avx2 or avx-512 are recommended.
Performance scales proportionality to the number of CPU cores available.
Hyper thread cores (threads) scales performance for most operations other than principal component analysis.
*Contact Partek support for recommended setup of local scratch storage
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Flow is a genomics data analysis and visualization software product designed to run on compute clusters. The following instructions assume the most basic setup of Partek Flow and must only be attempted by system administrators who are familiar with Linux-based commands. These instructions are not intended to be comprehensive. Cluster environments are largely variable, thus there are no 'one size fits all' instructions. The installation procedure on a computer cluster is highly dependent on the type of computer cluster and the environment it is located. We can to support a large array of Linux distributions and configurations. In all cases, Partek Technical Support will be available to assist with cluster installation and maintenance to ensure compatibility with any cluster environment. Please consult with Partek Licensing Support (licensing@partek.com) for additional information.
Prior to installation, make sure you have the license key related to the host-ID of the compute cluster the software will be installed in. Contact licensing@partek.com for key generation.
Make a standard linux user account that will run the Partek Flow server and all associated processes. It is assumed this account is synced between the cluster head node and all compute nodes. For this guide, we name the account flow
Log into the flow account and proceed to the cd to the flow home directory
Download Partek Flow and the remote worker package
Unzip these files into the flow home directory /home/flow. This yields two directories: partek_flow and P_artekFlowRemoteWorker_
Partek Flow can generate large amounts of data, so it needs to be configured to the bulk of this data in the largest shared data store available. For this guide we assume that the directory is located at /shared. Adjust this path accordingly.
It is required that the Partek Flow server (which is running on the head node) and remote workers (which is running on the compute nodes) see identical file system paths for any directory Partek Flow has read or write access to. Thus /shared and /home/flow must be mounted on the Flow server and all compute nodes. Create the directory /shared/FlowData and allow the flow linux account write access to it
It is assumed the head node is attached to at least two separate networks: (1) a public network that allows users to log in to the head node and (2) a private backend network that is used for communication between compute nodes and the head node. Clients connect to the Flow web server on port 8080 so adjust the firewall to allow inbound connections to 8080 over the public network of the head node. Partek Flow will connect to remote workers over your private network on port 2552 and 8443, so make sure those ports are open to the private network on the flow server and workers.
Partek Flow needs to be informed of what private network to use for communication between the server and workers. It is possible that there are several private networks available (gigabit, infiniband, etc.) so select one to use. We recommend using the fastest network available. For this guide, let's assume that private network is 10.1.0.0/16. Locate the headnode hostname that resolves to an address on the 10.1.0.0/16 network. This must resolve to the same address on all compute nodes.
For example:
host head-node.local yields 10.1.1.200
Open /home/flow/.bashrc and add this as the last line:
Source .bashrc so the environment variable CATALINA_OPTS is accessible.
NOTE: If workers are unable to connect (below), then replace all hostnames with their respective IPs.
Start Partek Flow
You can monitor progress by tailing the log file partek_flow/logs/catalina.out. After a few minutes, the server should be up.
Make sure the correct ports are bound
You should see 10.1.1.200:2552 and :::8080 as LISTENing. Inspect catalina.out for additional error messages.
Open a browser and go to http://localhost:8080 on the head node to configure the Partek Flow server.
Enter the license key provided (Figure 1)
Figure 1. Setting up the Partek Flow license during installation
If there appears to be an issue with the license or there is a message about 'no workers attached', then restart Partek Flow. It may take 30 sec for the process to shut down. Make sure the process is terminated before starting the server back up:
Then run:
You will now be prompted to setup the Partek Flow admin user (Figure 2). Specify the username (admin), password and email address for the administrator account and click Next
Figure 2. Setting up the Partek Flow 'admin' account during installation
Select a directory folder to store the library files that will be downloaded or generated by Partek Flow (Figure 3). All Partek Flow users share library files and the size of the library folder can grow significantly. We recommend at least 100GB of free space should be allocated for library files. The free space in the selected library file directory is shown. Click Next to proceed. You can change this directory after installation by changing system preferences. For more information, see Library file management.
Figure 3. Selecting the library file directory
To set up the Partek Flow data paths, click on Settings located on the top-right of the Flow server webpage. On the left, click on Directory permissions then Permit access to a new directory. Add /shared/PartekFlow and allow all users access.
Next click on System preferences on the left menu and change data download directory and default project output directory to /shared/PartekFlow/downloads and /shared/PartekFlow/project_output respectively
Note: If you do not see the /sharedfolder listed, click on the Refresh folder list link that is toward the bottom of the download directory dialog
Since you do not want to run any work on the head node, go to Settings>System preferences>Task queue and job processing and uncheck Start internal worker at Partek Flow server startup.
Restart the Flow server:
After 30 seconds, run:
This is needed to disable the internal worker.
Test that remote workers can connect to the Flow server
Log in as the flow user to one of your compute nodes. Assume the hostname is compute-0. Since your home directory is exported to all compute nodes, you should be able to go to /home/flow/PartekFlowRemoteWorker/
To start the remote worker:
These two addresses should both be in the 10.1.0.0/16 address space. The remote worker will output to stdout when you run it. Scan for any errors. You should see the message woot! I'm online.
A successfully connected worker will show up on the Resource management page on the Partek Flow server. This can be reached from the main homepage or by clicking Resource management from the Settings page. Once you have confirmed the worker can connect, kill the remote worker (CTRL-C) from the terminal in which you started it.
Once everything is working, return to library file management and add the genomes/indices required by your research team. If Partek hosts these genomes/indices, these will automatically be downloaded by Partek Flow
In effect, all you are doing is submitting the following command as a batch job to bring up remote workers:
The second parameter for this script can be obtained automatically via:
Bring up workers by running the command below. You only need to run one worker per node:
Go to the Resource management page and click on the Stop button (red square) next to the worker you wish to shut down. The worker will shut down gracefully, as in it will wait for currently running work on that node to finish, then it will shut down.
For the cluster update, you will get a link of .zip file for Partek Flow and remote Flow worker respectively from Partek support, all of the following actions should be performed as the Linux user that runs Flow. Do NOT run Flow as root.
Go to the Flow installation directory. This is usually the home directory of the Linux user that runs Flow and it should contain a directory named "partek_flow". The location of the Flow install can also be obtained by running ps aux | grep flow and examining the path of the running Flow executable.
Shut down Flow:
Download the new version of Flow and the Flow worker:
Make sure Flow has exited:
The flow process should no longer be listed.
Unpack the new version of Flow install and backup the old install:
Backup the Flow database folder. This should be located in the home directory of the user that runs Flow.
Start the updated version of Flow:
(make sure there is nothing of concern in this file when starting up Flow. You can stop the file tailing by typing: CTRL-C)
You may also want to examine the the main Flow log for errors:
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Note: This guide assumes all items necessary for the Amazon elastic Comput Clout (EC2) instance does not exist, such as Amazon Virtual Private Cloud (VPC), subnets, and security groups, thus their creation is covered as well.
Log in to the Amazon Web Services (AWS) management console at https://console.aws.amazon.com
Click on EC2
Switch to the region intended to deploy Partek Flow software. This tutorial uses US East (N. Virginia) as an example.
On the left menu, click on Instances, then click the Launch Instance button. The Choose an Amazon Machine Image (AMI) page will appear.
Click the Select button next to Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-f4cc1de2. NOTE: Please use the latest Ubuntu AMI. It is likely that the AMI listed here will be out of date.
Choose an Instance Type, the selection depends on your budget and the size of the Partek Flow deployment. We recommend m4.large for testing or cluster front-end operation, m4.xlarge for standard deployments, and m4.2xlarge for alignment-heavy workloads with a large user-base. See the section AWS instance type resources and costs for assistance with choosing the right instance. In most cases, the instance type and associated resources can be changed after deployment, so one is not locked into the choices made for this step.
NOTE: New instance types will become available. Please use the latest mX instance type provided as it will likely perform better and be more cost effective than older instance types.
On the Configure Instance Details page, make the following selections:
Set the number of instances to 1. An autoscaling group is not necessary for single-node deployments
Purchasing Option: Leave Request Spot Instances unchecked. This is relevant for cost-minimization of Partek Flow cluster deployments.
Network: If you do not have a virtual private cloud (VPC) already created for Partek Flow, click Create New VPC. This will open a new browser tab for VPC management.
Use the following settings for the VPC:
Name Tag: Flow-VPC
IPv4 CIDR block: 10.0.0.0/16
Select No IPv6 CIDR Block
Tenancy: Default
Click Yes, Create. You may be asked to select a DHCP Option set. If so, then make sure the dynamic host configuration protocol (DHCP) option set has the following properties:
Options: domain-name = ec2.internal;domain-name-servers = AmazonProvidedDNS;
DNS Resolution: leave the defaults set to yes
DNS Hostname: change this to yes as internal DNS resolution may be necessary depending on the Partek Flow deployment
Once created, the new Flow-VPC will appear in the list of available VPCs. The VPC needs additional configuration for external access. To continue, right click on Flow-VPC and select Edit DNS Resolution, select Yes, and then Save. Next, right click the Flow-VPC and select Edit DNS Hostnames, select Yes, then Save.
Make sure the DHCP option set is set to the one created above. If it is not, right-click on the row containing Flow-VPC and select Edit DHCP Option Sets.
Close the VPC Management tab and go back to the EC2 Management Console.
Click the refresh arrow next to Create New VPC and select Flow-VPC.
Click Create New Subnet and a new browser tab will open with a list of existing subnets. Click Create Subnet and set the following options:
Name Tag: Flow-Subnet
VPC: Flow-VPC
VPC CIDRs: This should be automatically populated with the information from Flow-VPC
Availability Zone: It is OK to let Amazon choose for you if you do not have a preference
IPv4 CIDR block: 10.0.1.0/24
Stay on the VPC Dashboard Tab and on the left navigation menu, click Internet Gateways, then click Create Internet Gateway and use the following options:
Name Tag: Flow-IGW
Click Yes, Create
The new gateway will be displayed as Detached. Right click on the Flow-IGW gateway and select Attach to VPC, then select Flow-VPC and click Yes, Attach.
Click on Route Tables on the left navigation menu.
If it exists, select the route table already associated with Flow-VPC. If not, make a new route table and associate it with Flow-VPC. Click on the new route table, then click the Routes tab toward the bottom of the page. The route Destination = 10.0.0.0/16 Target = local should already be present. Click Edit, then Click Add another route and set the following parameters:
Destination: 0.0.0.0/0
Target set to Flow-IGW (the internet gateway that was just created)
Click Save
Close the VPC Dashboard browser tab and go back to the EC2 Management Console tab. Note that you should still be on Step 3: Configure Instance Details.
Click the refresh arrow next to Create New Subnet and select Flow-Subnet.
Auto-assign Public IP: Use subnet setting (Disable)
Placement Group: No placement group
IAM role: None.
Note: For multi-node Partek Flow deployments or instances where you would like Partek to manage AWS resources on your behalf, please see Partek AWS support and set up an IAM role for your Partek Flow EC2 instance. In most cases a specialized IAM role is unnecessary and we only need instance ssh keys.
Shutdown Behaviour: Stop
Enable Termination Protection: select Protect against accidental termination
Monitoring: leave Enable CloudWatch Detailed Monitoring disabled
EBS-optimized Instance: Make sure Launch as EBS-optimized Instance is enabled. Given the recommended choice of an m4 instance type, EBS optimization should be enabled at no extra cost.
Tenancy: Shared - Run a shared hardware instance
Network Interfaces: leave as-is
Advanced Details: leave as-is
Click Next: Add Storage. You should be on Step 4: Add Storage
For the existing root volume, set the following options:
Size: 8 GB
Volume Type: Magnetic
Select Delete on Termination
Note: All Partek Flow data is stored on a non-root EBS volume. Since only the OS is on the root volume and not frequently re-booted, a fast root volume is probably not necessary or worth the cost. For more information about EBS volumes and their performance, see the section EBS volumes.
Click Add New Volume and set the following options:
Volume Type: EBS
Device: /dev/sdb (take the default)
Do not define a snapshot
Size (GiB): 500
Note: This is the minimum for ST1 volumes, see: EBS volumes
Volume Type: Throughput optimized HDD (ST1)
Do not delete on terminate or encrypt
Click Next: Add Tags
You do not need to define any tags for this new EC2 instance, but you can if you would like.
Click Next: Configure Security Group
For Assign a Security Group select Create a New Security Group
Security Group Name: Flow-SG
Description: Security group for Partek Flow server
Add the following rules:
SSH set Source to My IP (or the address range of your company or institution)
Click Add Rule:
Set Type to Custom TCP Rule
Set Port Range to 8080
Set Source to anywhere (0.0.0.0/0, ::/0)
Note: It is recommended to restrict Source to just those that need access to Partek Flow.
Click Review and Launch
The AWS console will suggest this server not be booted from a magnetic volume. Since there is not a lot of IO on the root partition and reboots are will be rare, choosing Continue with Magnetic will reduce costs. Choosing an SSD volume will not provide substantial benefit but it OK if one wishes to use an SSD volume. See the EBS Volumes section for more information.
Click Launch
Create a new keypair:
Name the keypair Flow-Key
Download this keypair, the run chmod 600 Flow-Key.pem (the downloaded key) so it can be used.
Backup this key as one may lose access to the Partek Flow instance without it.
The new instance will now boot. Use the left navigation bar and click on Instances. Click the pencil icon and assign the instance the name Partek Flow Server
The server should be assigned a fixed IP address. To do this, click on Elastic IPs on the left navigation menu from the EC2 Management Console.
Click Allocate New Address
Assign Scope to VPC
Click Allocate
On the table containing the newly allocated elastic IP, right click and select Associate Address
For Instance, select the instance name Flow Test Server
For Private IP, select the one private IP available for the Partek Flow EC2 instance, then click Associate
Note: For the remaining steps, we refer to the elastic ip as elastic.ip
SSH to the new Flow-Server instance:
Attach, format, and move the ubuntu home directory onto the large ST1 elastic block store (EBS) volume. All Partek Flow data will live in this volume. Consult the AWS EC2 documentation for further information about attaching EBS volumes to your instance.
Note: Under Volumes in the EC2 management console, inspect Attachment Information. It will likely list the large ST1 EBS volume as attached to /dev/sdb. Replace "s" with "xv" to find the device name to use for this mkfs command.
Make a note of the newly created UUID for this volume
Copy the ubuntu home directory onto the EBS volume using a temporary mount point:
Make the EBS volume mount at system boot:
Add the following to /etc/fstab: UUID=the-UUID-from-the-mkfs-command-above /home ext4 defaults,nofail 0 2
Disconnect the ssh session, then log in again to make sure all is well
Note: For additional information about Partek Flow installations, see our generic Installation Guide
Before beginning, send the media access control (MAC) address of the EC2 instance to MAC address of the EC2 instance to licensing@partek.com. The output of ifconfig will suffice. Given this information, Partek employees will create a license for your AWS server. MAC addresses will remain the same after stopping and starting the Partek Flow EC2 instance. If the MAC address does change, let our licensing department know and we can add your license to our floating license server or suggest other workarounds.
Install required packages for Partek Flow:
Install Partek Flow:
Note: Make sure you are running as the ubuntu user.
Partek Flow has finished loading when you see INFO: Server startup in xxxxxxx ms in the partek_flow/logs/catalina.out log file. This takes ~30 seconds.
Alternative: Install Flow with Docker. Our base packages are located here: https://hub.docker.com/r/partekinc/flow/tags
Open Partek Flow with a web browser: http://elastic.ip:8080/
Enter license key
Set up the Partek Flow admin account
Leave the library file directory at its default location and check that the free space listed for this directory is consistent with what was allocated for the ST1 EBS volume.
Done! Partek Flow is ready to use.
After the EC2 instance is provisioned, we are happy to assist with setting up Partek Flow or address other issues you encounter with the usage of Partek Flow. The quickest way to receive help is to allow us remote access to your server by sending us Flow-Key.pem and amending the SSH rule for Flow-SG to include access from IP 97.84.41.194 (Partek HQ). We recommend sending us the Flow-Key.pem via secure means. The easiest way to do this is with the following command:
We also provide live assistance via GoTo meeting or TeamViewer if you are uncomfortable with us accessing your EC2 instance directly. Before contacting us, please run $ ./partek_flow/flowstatus.sh to send us logs and other information that will assist with your support request.
With newer EC2 instance types, it is possible to change the instance type of an already deployed Partek Flow EC2 server. We recommend doing several rounds of benchmarks with production-sized workloads and evaluate if the resources allocated to your Partek Flow server are sufficient. You may find that reducing resources allocated to the Partek Flow server may come with significant cost savings, but can cause UI responsiveness and job run-times to reach unacceptable levels. Once you have found an instance type that works, you may wish to use reserved instance pricing which is significantly cheaper than on-demand instance pricing. Reserved instances come with 1 or 3-year usage terms. Please see the EC2 Reserved Instance Marketplace to sell or purchase existing reserved instances at reduced rates.
The network performance of the EC2 instance type becomes an important factor if the primary usage of Partek Flow is for alignment. For this use case, one will have to move copious amounts of data back (input fastq files) and forth (output bam files) between the Partek Flow server and the end users, thus it is important to have as what AWS refers to as high network performance which for most cases is around 1 Gb/s. If the focus is primarily on downstream analysis and visualization (e.g. the primary input files are ADAT) then network performance is less of a concern.
We recommend HVM virtualization as we have not seen any performance impact from using them and non-HVM instance types can come with significant deployment barriers.
Make sure your instance is EBS optimized by default and you are not charged a surcharge for EBS optimization.
T-class servers, although cheap, may slow responsiveness for the Partek Flow server and generally do not provide sufficient resources.
We do not recommend placing any data on instance store volumes since all data is lost on those volumes after an instance stops. This is too risky as there are cases where user tasks can take up unexpected amounts of memory forcing a server stop/reboot.
The values below were updated April 2017. The latest pricing and EC2 resource offerings can be found at http://www.ec2instances.info
Single server recommendation: m4.xlarge or m4.2xlarge
Network performance values for US-EAST-1 correspond to: Low ~ 50Mb/s, Medium ~ 300Mb/s, High ~ 1Gb/s.
Choice of a volume type and size:
This is dependent on the type of workload. For must users, the Partek Flow server tasks are alignment-heavy so we recommend a throughput optimized HDD (ST1) EBS volume since most aligner operations are sequential in nature. For workloads that focus primarily on downstream analysis, a general purpose SSD volume will suffice but the costs are greater. For those who focus on alignment or host several users, the storage requirements can be high. ST1 EBS volumes have the following characteristics:
Max throughput 500 MiB/s
$0.045 per GB-month of provisioned storage ($22.5 per month for a 500 GB of storage).
Note that EBS volumes can be grown or performance characteristics changed. To minimize costs, start with a smaller EBS volume allocation of 0.5 - 2 TB as most mature Partek Flow installations generate roughly this amount of data. When necessary, the EBS volume and the underlying file system can be grown on-line (making ext4 a good choice). Shrinking is also possible but may require the Partek Flow server to be offline.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Flow provides the infrastructure to isolate data from different users within the same server. This guide will provide general instructions on how to create this environment within Partek Flow. This can be modified to accommodate existing file systems already accessible to the server.
Go to Settings > Directory permissions and restrict parent folder access (typically /home/flow) to Administrator accounts only
Click the Permit access to a new directory button and navigate to the folder with your library files (typically /home/flow/FlowData/library_files). Select the All users (automatically updated) checkbox to permit all users (including those that will be added in the future) to see the library files associated with the Partek Flow server
Then go to System preferences > Filesystem and storage and set the Default project output directory to "Sample file directory"
Create your first user and select the Private directory checkbox. Specify where the private directory for that user is located
If needed, you can create a user directory by clicking Browse > Create new folder
This automatically sets browsing permissions for that private directory to that user
When a user creates a project. The default project output directory is now within their own restricted folder
More importantly, other users cannot see them
Add additional users as needed
Flow ships with tasks that do not have all of their dependencies included. On startup Flow will attempt to install the dependencies, but not every system is equipped to install them.
In the case of any difficulties, it is highly recommended to instead use a docker deployment (cluster installations may require singularity instead, which is somewhat still a work-in-progress)Z
Requires Python 2.7 or later.
On startup Flow will attempt to install additional python packages using the command
Requires R 3.2.3 or later.
On startup Flow will attempt to install additional R packages.
There are cascading dependencies, but you can view the core libraries in partek_flow/bin/cnvkit-0.8.5/install.R
If these packages can't be built locally, it may be possible for the user to download them from us (see below).
Requires R 3.0 or later.
On startup Flow will attempt to install additional R packages.
There are cascading dependencies, but you can view the core libraries in partek_flow/bin/deseq_two-3.5/install.R
If these packages can't be built locally, it may be possible for the user to download them from us (see below).
RcppArmadillo may also have dependencies on multi-threading shared objects that may not be on the LD_LIBRARY_PATH
The recommendation is to copy those .so files to a folder and make sure it is available from the LD_LIBRARY_PATH when the server/worker starts.
Additional dynamic libraries (such as libxml2.so) may be missing and we can provide a copy appropriate for the target OS.
Requires Python 2.7 or 3.4 or above
On startup Flow attempts to install using pip
Requires python 3.0 or above
If there are any conflicts with preinstalled python packages, Flow should be configured to run with its own virtual environment:
or
R can usually be installed from the package manager. If the user installs Flow via apt or yum it should already be installed.
Currently, we offer a set of R packages compatible with some versions of R
Extract this file in the home directory. (Make .R a symlink if the home directory doesn't have enough free space)
These packages include the dependencies for both CNVkit and DESeq2
When running R diagnostic commands outside flow, it can simplify things if the environment includes a reference to the ~/.R folder:
or load
in ~/.Rprofile
list loaded packages:
get the version:
This is a compiled Perl script (so it has no direct dependency on Perl itself) we have had one report (istem.fr) of it failing to run.
DECoN requires R version 3.1.2
It must be installed under /opt/R-3.1.2 or set the DECON_R environment variable to its folder
Download DECoN
and install it under /opt/DECoN or set the DECON_PATH environment variable to its folder
You may need to add
to Linux/packrat/packrat.opts
Below are the yaml documents which describe the bare minimum infrastructure needed for a functional Flow server. It is best to start with a single-node proof of concept deployment. Once that works, the deployment can be extended to multi-node with elastic worker allocation. Each section is explained below.
On a kubernetes cluster, all Flow deployments are placed in their own namespace, for example namespace: partek-flow. The label app.kubernetes.io/name: flowheadnode allows binding of a service or used to target other kubernetes infrastructure to this headnode pod. The label deployment: dev allows running multiple Flow instances in this namespace (dev, tst, uat, prd, etc) if needed and allows workers to connect to the correct headnode. For stronger isolation, running each Flow instance in its own namespace is optimal.
The Flow docker image requires 1) a writable volume mounted to /home/flow 2) This volume needs to be readable and writable by UID:GID 1000:1000 3) For a multi-node setup, this volume needs to be cross mounted to all worker pods. In this case, the persistent volume would be backed by some network storage device such as EFS, NFS, or a mounted FileGateway.
This section achieves goal 2)
The flowconfig volume is used to override behavior for custom Flow builds and custom integrations. It is generally not needed for vanilla deployments.
Partek Flow is shipped as a single docker image containing all necessary dependencies. The same image is used for worker nodes. Most deployment-related configuration is set as environment variables. Auxiliary images are available for additional supporting infrastructure, such as flexlm and worker allocator images.
Official Partek Flow images can be found on our release notes page: Release Notes The image tags assume the format: registry.partek.com/rtw:YY.MMMM.build New rtw images are generally released several times a month. The image in the example above references a private ECR. It is highly recommended that the target image from registry.partek.com be loaded into your ECR. Image pulls will be much faster from AWS - this reduces the time to dynamically allocate workers. It also removes a single point of failure - if registry.partek.com were down it would impact your ability to launch new workers on demand.
Partek Flow uses the head node to handle all interactive data visualization. Additional CPU resources are needed for this, the more the better and 8 is a good place to start. As for memory, we recommend 8 to 16 GiB. Resource limits are not included here, but are set to large values globally:
Partek Flow uses FlexLM for licensing. Currently we do not offer or have implemented any alternative. Values for this environment variable can be:
An external flexlm server. We provide a Partek specific container image and detail a kubernetes deployment for this below. This license server can also live outside the kubernetes cluster - the only requirement being that it is network accessible. /home/flow/.partekflow/license/Partek.lic - Use this path exactly. This path is internal to the headnode container and is persisted on a mounted PVC.
Unfortunately, FlexLM is MAC address based and does not quite fit in with modern containerized deployments. There is no straightforward or native way for kubernetes to set the MAC address upon pod/container creation, so using a license file on the flowheadnode pod (/home/flow/.partekflow/license/Partek.lic ) could be problematic (but not impossible). In further examples below, we provide a custom FlexLM container that can be instantiated as a pod/service. This works by creating a new network interface with the requested MAC address inside the FlexLM pod.
Please leave this set at "1". Partek Flow need not enforce any limits as that is the responsibility of kubernetes. Setting this to anything else may result in Partek executables hanging.
This is a hodgepodge of Java/Tomcat options. Parts of interest:
It is possible for the Flow headnode to execute jobs locally in addition to dispatching them to remote workers. These two options set resource limits on the Flow internal worker to prevent resource contention with the Flow server. If remote workers are not used and this remains a single-node deployment, meaning ALL jobs will execute on the internal worker, then it is best to remove the CPU limit (-DFLOW_WORKER_CORES) and only set -DFLOW_WORKER_MEMORY_MB equal to the kubernetes memory resource request.
If Flow connects to a corporate LDAP server for authentication, it will need to trust the LDAP certificates.
JVM heap size. If the internal worker is not used, set this to be a little less than the kubernetes memory resource request. If the internal worker is an use, and the intent is to stay with a single-node deployment, then set this to be ~ 25% of the kubernetes memory resource request, but no less than ~ 4 GiB.
The flowheadnode service is needed 1) so that workers have a DNS name (flowheadnode) to connect to when they start and 2) so that we can attach an ingress route to make the Flow web interface accessible to end users. The app.kubernetes.io/name: flowheadnode selector is what binds this to the flowheadnode pod.
80:8080 - Users interact with Flow entirely over a web browser
2552:2552 - Workers communicate with the Flow server over port 2552
8443:8443 - Partek executed binaries connect back to the Flow server over port 8443 to do license checks
This provides external users HTTPS access to Flow at host: flow.dev-devsvc.domain.com Your details will vary. This is where we bind to the flowheadnode service.
The yaml documents above will bring up a complete Partek-specific license server.
Note that the service name is flexlmserver. The flowheadnode pod connects to this license server via the PARTEKLM_LICENSE_FILE="@flexlmserver" environment variable.
You should deploy this flexlmserver first, since the flowheadnode will need it available in order to start in a licensed state.
Partek will send a Partek.lic file licensed to some random MAC address. When this license is (manually) written to /usr/local/flexlm/licenses, the pod will continue execution by creating a new network interface using the MAC address in Partek.lic, then it will start the licensing service. This is why the NET_ADMIN capability is added to this pod.
The license from Partek must contain VENDOR parteklm PORT=27001 so the vendor port remains at 27001 in order to match the service definition above. Without this, this port is randomly set by FlexLM.
This image is currently available from public.ecr.aws/partek-flow/kube-flexlm-server but this may change in the future.
Before performing updates, we recommend Backing Up the Database.
Updates are applied using the Linux package manager.
Make sure Partek Flow is stopped before updating it.
To update Partek Flow, open a terminal window and enter the following command.
For Debian/Unbuntu, enter:
For Redhat/Fedora/CentOS, enter:
For the YUM package manager, if updating Partek Flow fails with a message claiming "package not signed," enter:
Note that our packages are signed and the message above is erroneous.
For tomcat build update, download the latest version from below:
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
JKS or Java KeyStore is used in Flow for some very specific scenarios where encryption is involved and there is a need for asymmetric encryption.
Partek Flow is shipped with a Java Keystore on its own, the file is found at .../partek_flow/distrib/flowkeystore where you may want to add your public and private certificates.
If you already have a certificate please skip to the next step.
Please place the key in a secure folder. (it is advisable to place in Flow's home directory. eg. /home/flow/keys
These commands above are meant to be used in a terminal. There are other ways to help you make a certificate but they will not going to be mentioned here.
If you wish to understand the flags used above please refer to the OpenSSL documentation.
For this step you will have to find where the cacerts file is located, it is under the Java installation, if you do not know how to do it contact us and we can help.
In the example the cacerts file is located at /usr/lib/jvm/java-11-openjdk-amd64/lib/security/cacerts
We need to tell Partek Flow where the key is located, to do this we will edit a file which contains some of the Flow settings.
The file is usually located at /etc/partekflow.conf if you do not have this file we would advise to use the bashrc file from the system user that runs Partek Flow.
At the end of that file please add:
Docker can be used along Partek Flow to deploy an easy to maintain environment which will not have dependency issues and will be easy to relocate among different servers if needed.
One can follow the Docker documentation to install and get started.
“Docker is a platform for developers and sysadmins to build, run, and share applications with containers. The use of containers to deploy applications is called containerization. “
This command will output the details of the currently running containers including port forwarding, container name/id, and uptime.
This command will allow us to enter the running container’s environment to troubleshoot any issues we might have with the container. (the containers are not meant to be changed the correct way to deal with any issues is creating a new one after the troubleshot)
“Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.“
Below it is an example of a docker-compose.yml file which can be used to bring a Partek Flow server with an extra worker.
These are some of the important tags shown above:
restart: whether you want the container to be restarted automatically upon failure or system restart.
image: the image we distributed and the desired version, even though we always recommend the users to run the latest version of the software if you need any specific versions of Partek Flow please visit here.
environment: here we set up any environment variables to be run along the container.
port: the default port to Partek Flow is 8080 and if you wish to change what port it will be accessible please change the first part (left to the colon) of 8080:8080. So if you wish to access the server on port 8888 then the correct format will be 8888:8080
mac_address: this needs to match your license file
volumes: in this section we specify the folder on the server to be shared with the container, this way we can better persist and access the files we create and use in the container.
Open a terminal window and enter the following command.
Debian/Ubuntu:
RedHat/Fedora/CentOS:
The uninstall removes binaries only (/opt/partek_flow). The logs, database (partek_db) and files in the home/flow/.partekflow folder will remain unaffected.
Stop and quit Partek Flow using the Partek Flow app in the menu.
Using Finder, delete Flow application from the Applications menu.
Missing image Figure 1. Control of Partek Flow through the menu bar
This process does not delete data or the library files. Users who wish to delete those can delete them using Finder or terminal. The default location of project output files and library files is the /FlowData directory under the user's home folder. However, the actual location may vary depending on your System or Project specific settings.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Instance Type | Memory | Cores | EBS throughput | Network Performance | Monthly cost |
---|---|---|---|---|---|
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
For older operating systems R is not available and will need to be installed from
DECoN comes pre-installed in the flow_dna container
Documentation on installing DECoN is available here:
See also:
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Partek will work with the customer to make a docker-compose file that will have all the configuration necessary to run Partek Flow on any machine that meets our .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
m4.large
8.0 GB
2 vCPUs
56.25 MB/s M
Medium
$78.840
r4.large
15.25 GB
2 vCPUs
50 MB/s H(10G int)
High (+10G interface)
$97.09
m4.xlarge
16.0 GB
4 vCPUs
93.75 MB/s H
High
$156.950
r4.xlarge
30.5 GB
4 vCPUs
100 MB/s H
High
$194.180
m4.2xlarge
32.0 GB
8 vCPUs
125 MB/s H
High
$314.630
r4.2xlarge
61.0 GB
8 vCPUs
200 MB/s H(10G int)
High (+10G interface)
$388.360
Here you will find videos of past live training webinars.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Using a web browser, log in to Partek Flow. From the Home page click the New Project button; enter a project name (Figure 1) and then click Create project.
The Project name is the basis of the default name of the output directory for this project. Project names are unique, thus a new project cannot have the same name as an existing project within the same Partek Flow server.
Once a new project has been created, the user is automatically directed to the Analysis tab of the Project View (Figure 2).
Partek Flow software manages separate experiments as projects. A complete project consists of input data, tasks used to analyze the data, the resulting output files, and a list of users involved in the analysis.
This chapter provides instructions in creating and analyzing a project and covers:
Partek Flow tutorials provide step-by-step instructions using a supplied data set to teach you how to use the software tools. Upon completion of each tutorial, you will be able to apply your knowledge in your own studies.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The Partek Flow Metadata Tab has an option to import data, and is where sample/cell attributes are managed. This is also where users can modify the location of the project output folder.
The Metadata tab can be used to import data. To add samples to the project, click Add data under Import, different import options are displayed using the cascading menu (Figure 1).
This method adds samples by creating them simultaneously as the data gets imported into a project. The sample names are assigned automatically based on filenames.
Before proceeding, it is ideal that you have already transfer the data you wish to analyze in a folder (with appropriate permissions) within the Partek Flow server. Please seek assistance from your system administrator in uploading your data directly.
Select the Automatically create samples from files button. The next screen will feature a file browser that will show any folders you have access to in the Partek Flow server (Figure 2). Select a folder by clicking the folder name. Files in the selected folder that have file formats that can be imported by Partek Flow will be displayed and tick-marked on the right panel. You can exclude some files from the folder by unselecting the check mark on the left side of the filename. When you have made your selections, click the Create sample button.
Alternatively, files can also be uploaded and imported into the project from the user's local computer -only use this option if your file size is less than 500MB. Select the My computer radio button (Figure 3) and the options of selecting the local file and the upload (destination) directory will appear. Only one file at a time can be imported to a project using this method.
Multiple data files can be compressed a single .zip file before uploading. Partek Flow will automatically unzip the files and put them in the upload directory.
Please be aware that the use of the method illustrated in Figure 3 highly depends on the speed and latency of the Internet connection between the user's computer and the server where Partek Flow is installed. Given the large size of most genomics data sets, is not recommended in most cases.
After successful creation of samples from files, the Data tab now contains a Sample management table (Figure 4). The Sample name column in the table is automatically generated based on the filenames and the table is sorted in alphabetical order.
Clicking the on the** Show data files** link on the lower right side of the sample management table will expand the table and reveal the filenames of the files associated with each sample. Conversely, clicking on Hide data files will hide the file information.
The columns in the expanded view show the files associated with each sample. Files are organized by file type. Any filename extensions that indicate compression (such as .gz) are not shown.
Once a sample is created in a project, the files associated with it can be modified. In the expanded view, mouse over the +/- column of a sample. The highlighted icons will correspond to the options for the sample on that row.
Samples can be added one at a time by selecting the Create a new blank sample option (Figure 5). In the following dialog box, type a sample name and click Create. This process creates a sample entry in the sample management table but there is no associated file with it, hence it is a "blank sample."
Expanding the Sample management table by clicking Show data files on the lower left corner of the table will reveal the option to associate files to the blank sample.
Alternatively, if you have a matrix of data, such as raw read count data in text format, select Import count matrix. The requirements of this text file are listed below:
The file contains numeric values in a tab-delimited format, samples can be on rows while features (e.g.gene names) are in columns, or vice versa
The file contains unique sample IDs and feature IDs
If the data contains sample attribute information, all these attributes have to be ether
The leftmost columns when samples are on rows (Figure 6)
The first few rows when samples are on columns (Figure 7)
Like all other input files, you can upload the file from the Partek Flow server, My Computer or via a URL. Uploading the file brings up a file preview window (Figure 8). The preview of the first few rows and columns of the text file should help you determine on which rows/columns the relevant counts are located (the preview will display up to 100 rows and 100 columns). Inspect the text preview and indicate the orientation of the text file under File format>Input format.
If the read counts are based on a compatible annotation file in Partek Flow, you can specify that annotation file under Gene/feature annotation. Select the appropriate genome build and annotation model for your count data. Select the Contain sample attributes checkbox if your data includes additional sample information.
The example above is showing an example text file with samples listed on rows. The gene ID is compatible with the hg19 RefSeq Transcripts - 2016-08-01 annotation model. Under the Column information and Row information sections, indicate the location of the Sample ID, which in this case is on Column 1. Indicate the sample attribute location by marking where it starts, which in the example is at Column 2. Mark the Feature ID, which in this case are gene IDs and starts at Column 4.
If the data has been log transformed, specify the base under Counts format.
The project output directory is the folder within the Partek Flow server where all output files produced during analysis will be stored.
The default directory is configured by the Partek Flow Administrator under the Settings menu (under System Preferences > Default project output directory).
If the user does not override the default, the task output will go to a subdirectory with the name of the Project.
After samples have been added in the project, additional information about the samples can be added. Information such as disease type, age, treatment, or sex can be annotated to the data by assigning the Attributes for each sample.
Certain tasks in Partek Flow, such as Gene-Specific Analysis, require that samples be assigned attributes in order to do statistical comparisons between groups or samples. As attributes are added to the project, additional columns in the sample management table will be created.
Attributes can be managed or created within a project. Under the Data tab, click the button to open the Manage attributes page (Figure 9).
To prepare for later data analysis using statistical tools, attributes can either be categorical or numeric (i.e., continuous).
For categorical attributes, there are two levels of visibility. Project-specific categorical attributes are visible only within the current project. System-wide categorical attributes are visible across all the projects within the Partek Flow server, and are useful for maintaining uniformity of terms. Importing samples in a new project will retain the system-wide attributes, but not the project-specific attributes.
A feature of Partek Flow is the use of controlled vocabulary for categorical attributes, allowing samples to be assigned only within pre-defined categories. It was designed to effectively manage content and data and allow teams to share common definitions. The use of standard terms minimizes confusion.
To add a categorical attribute in the Manage attributes page, click the Add new attribute (Figure 10). In the dialog box, type a Name for the attribute, select the Categorical radio button next to Attribute type, select the visibility of the attribute and then click the Add button.
Repeat the process for additional attributes of the samples in your study. When done, click Back to sample management table. Categorical attributes will default to Project-specific visibility.
Click an attribute name to drag and drop can change the order of the attributes displayed on the data tab. Click on group name to drag and drop vertically can change the order of the group name, which can be reflected on visualization.
To add a numeric attribute in the Manage attributes page, click the Add new attribute. In the dialog box (Figure 13), type a Name for the attribute, select the Numeric radio button next to Attribute type, and then click the Add button. Some optional parameters for numeric attributes include the Minimum value, Maximum value, and Units. When done, click Add to return to the Manage attributes page. Repeat the process add more numeric attributes. When done, click Back to sample management table.
Since system-wide attributes do not have to be created by the current user, they only need to be added to the sample management table in a project.
In the Data tab, click Add a system-wide attribute button. In the dialog box that follows (Figure 14), a drop down menu is located next to Add attribute where you can select the System-wide attribute you would like to add to the project. Once selected, it will be recognized automatically as either Categorical, system-wide or the Numeric attribute.
For an System-wide categorical attribute, the different categories are listed and you have the option of pre-filling the columns with N/A (or any other category within the attribute). Click Add column and you will return to the Data Tab.
After adding all the desired attributes to a project, the sample management table will show a new column for each attribute (Figure 15). The columns will initially as "N/A", as the samples have not yet been categorized or assigned a value. To edit the table, click Edit attributes. Assign the sample attributes by using a drop down for categorical attributes (controlled vocabulary) or typing with a keyboard for numeric attributes.
When all the attributes have been entered, click Apply changes and the sample management table will be updated. After editing the sample table, make sure there are no fields with blank or N/A values before proceeding. To rename or delete attributes, click Manage attributes from the Data tab to access the Manage attributes page.
Another way to assign attributes to samples in the Data tab is to use a text file that contains the table of attributes and categories/values. This table is prepared outside of Partek Flow using any text editing software capable of saving tab-separated text files.
Using a text editor, prepare a table containing the attributes. An example is shown in Figure 16. There should only be one tab between columns with no extra tabs after the last column. In this particular example, the first column contains the filename and the text file is saved as Sampleinfo.txt.
The first row of the table in the text file contains the attributes (as headers). The first column of the table in the text file, regardless of the header of the first column, should contain either the sample names or the file names of the samples already added in Partek Flow. The first column is the unique identifier that will match the samples to the correct values or categories.
To upload sample attributes, click Assign sample attributes from a file in the Data tab. Then indicate where the attribute text file is stored and navigate to it. Partek Flow will parse the text file and present attributes that will be available for import (Figure 17).
Select the attributes you want to import by clicking the Import check box. Imported attributes that do not currently exist in the project will create new project-specific attributes.
You can change the name of a specific attribute by editing the Attribute name text box. Columns containing letter characters are automatically selected as categorical attributes. Columns containing numbers are suggested to be numeric attributes and can be changed to categorical using the drop down menu under Attribute type.
The first column is always the unique identifier and can refer only to File names or Sample names.
If using Sample names in the first column, they must match the entries of the Sample name column in the Sample management table.
If using File names in the first column, use the filenames shown in the fastq column of the expanded sample management table (see Figure 4) then add the extension .gz. All filenames must include the complete file extension (e.g., Samplename.fastq.gz).
The header name of the first column of the table (top left cell of our text table) is irrelevant but should not be left blank. Whether the first column contains File names or Sample names will be chosen during the process.
The last column cannot have empty values
Missing data (blank cells) can only be handled if the attribute is numeric. If it is categorical, please put a character in it.
It is advisable to use Sample name as the first column identifier when:
Samples are associated with more than one file (for instance, paired-end reads and/or technical replicates).
The files were imported in the SRA format (from the Sequence Read Archive database). In Partek Flow, they are automatically converted to the FASTQ format. Consequently, their filenames would change once they are imported. The new file names can be seen by expanding the sample management table, the new extension would be .fastq.gz.
If attributes are assigned from two different text files, the following will happen:
If the previous attributes have the same header and type (both are either categorical or numeric), the values are overwritten.
If there are different/additional headers on the "second round" of assignment, these new attributes will be appended to the table.
For numeric attributes, a "blank" value will not override a previous value.
The attributes assigned to the samples within the Data Tab will be associated with the samples throughout the project. During the course of analysis, Partek Flow tasks generate various tables and any attributes associated with a sample can be included in the table as optional columns. An example is shown in Figure 18 for a pre-alignment QA/QC report where the Optional columns link on the top left of the table reveal the different sample attributes.
In the Data tab, each sample can be renamed or deleted from the project by clicking the gear icon next to the sample name. The gear icon is readily visible upon mouse over (Figure 19). Sample can only be deleted if no analysis has been performed on the data yet. If any analysis has been performed on the data node, then delete sample operation is invisible. You can perform filter samples in downstream analysis if you want to exclude certain samples in further analysis. Deleting a sample from a project does not delete the associated files, which will remain on the disk.
The Attachments tab allows the project owner to add external (i.e. non-Partek Flow) files to a project (for instance, spreadsheets, word documents, manuscripts). To attach a file, go to the Attachments tab (Figure 1). Choose File button invokes the file browser showing the directory structure of the local computer. Select the file that you want to attach and then click on the Upload attachment button. For security reasons, Partek Flow will not allow you to add an executable file.
All added files will be listed in the table under the Attachments tab (Figure 2). The tab will also display file sizes, the user name of the person who uploaded the file and the time it was uploaded. Note that uploaded files will count towards the total size of the project, and thus, if available, to the disk quota of the project owner.
To remove a file, click the icon. To download the attachment, click the icon.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Click the green icon ( ) to associate additional files or the red icon ( ) to dissociate a file from a sample. You can manually associate multiple files with one sample. Dissociating a file from a sample does not delete the file from the Partek Flow server.
Mouse over the +/- column and click the green icon ( ) to associate a file(s) to the sample. Perform the process for every sample in your project.
The user has the option of specifying an existing folder or creating a new one as the project output directory. To do so, click the icon next to the directory and specify or create a new folder in the dialog box.
Individual categories for the attribute must then be entered. Enter a name of the New category in the New category text box and click Add (Figure 11). The Name of the new category will show up in the table. The category can also be edited by clicking or deleted by clicking (visible on mouse-over). Repeat to add additional categories within the attribute.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The Log tab contains a table of the tasks that are running, scheduled, or those that have been completed within the Partek Flow (Figure 1). It provides an overview of the task progress, enables task management, and links to detailed reports for each task.
Each row of the table corresponds to a task node in the Analyses tab. The list can be sorted according to a specific column using the sort icon .
The Task column lists the name of the tasks. On the left of the task name is a colored circle indicating the layer of this task. The column is searchable by task name. Clicking the task name will open the Task report page. If the task did not generate a report, the link will go to the Task details page.
The User column identifies the task owner. Aside from the user who created the project, only collaborators and users with admin privileges can start tasks in a project. Clicking a name in the User column will display the corresponding User profile.
The End column shows when the task was completed. It will show the actual time for completed tasks, and the estimated time for running tasks. These estimates improve in accuracy as more tasks are completed in the current Partek Flow instance.
The Status column displays the current status of the task, such as Waiting, Running, Done, Canceled. If the task is currently running, a status progress bar will appear in the column. Once completed, the status of a task will be Done and the End column will be updated with the completion time.
A waiting task may be waiting for upstream tasks to complete ( ) or waiting for more computing resources to be available ( ).
The Action column contains the cancel button ( ) while a task is queued or running. Clicking this button will cancel the task. A trash icon ( ) will appear in the Action column for completed, canceled or failed tasks, and will allow the task to be deleted from the project. Deleting a task in the Queue tab will remove the corresponding nodes in the Analyses tab. Unless the user has admin privileges, a user may only cancel and delete a task that he/she started. The User, End, and Status columns may be used to filter ( ) the table.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Project details section shows the Name of the project as well as (optional) a textual project Description and a Thumbnail (picture) (Figure 1).
The owner and collaborators (if any) can customize the Description and Thumbnail entries by pushing the orange Edit project details button (Figure 2). The fields can now be edited to:
Rename the project (names are limited to 30 characters). The original project Name is the one selected when creating the project
Add or change a project description (up to 2000 characters)
Add or change a thumbnail of the project (supported formats are .jpg, .bmp, .gif, .png; the maximum size of the image file is 10 MB)
The Description accepts hyperlinks starting with "http://" or "https://" and if selected, will open a new tab browser to navigate to the website. This description will be also displayed to collaborators and administrators on the Partek® Flow® Home Page. Choose File button launches a file browser showing directory structure of the local computer from which the thumbnail image file will be uploaded. Alternatively, Clear thumbnail button removes the current thumbnail.
Once all the edits have been made, push Save to accept (or Cancel to reject).
If a thumbnail has been added, it will appear on the Project details tab (Figure 3) and on the home page of Partek Flow, on the Details tab of the project.
The Members section provides an overview of users associated with a particular project and enables project creators (owners) and administrators to add collaborators (Figure 1). A user (without administrator status) has to be specified as a collaborator in a project to be able to access the project in his/her home folder and to perform tasks.
To add a collaborator, use the Add member drop-down menu. The drop-down menu will list users you are collaborating with on any project on the current instance of Partek Flow. Click a user name in the drop-down list and then click the button to add them as a collaborator. To add a user you have not collaborated with before, type their exact username (e.g., jsmith) and click the button to add them. Depending on the collaborator's preference settings, an email notification may be sent to the email address associated with their user account. To delete a collaborator, select the next to their username (you will be asked for confirmation).
Pushing the pencil icon (Pencil icon](../../../.gitbook/assets/pencil-icon.png)) by a project member can result in two dialogs, depending on the status of the member. For a collaborator or a viewer, the pencil icon changes the member's role (e.g. from a Viewer to a Collaborator) (Figure 4).
Moreover, the project owner can transfer the ownership to another user account (one of the accounts already available at the current instance of Partek Flow) using the New owner dropdown list (Figure 5). The previous (old) owner can remain as a project collaborator, with the help of the matching option.
If email notifications are turned on for project ownership transfer, an email dialog box appears. This can be used to add additional text to the notification email body (Figure 6).
After samples have been added and associated with valid data files, in a Partek Flow project, a data node will appear in the Analyses tab (Figure 1). The Analyses tab is where different analysis tools and the corresponding reports are accessed.
The Analyses tab contains two elements: data nodes (circles) and task nodes (rounded rectangles) connected by lines and arrows. Collectively, they represent a data analysis pipeline.
Data nodes (Figure 2) may represent a file imported into the project, or a file generated by Partek Flow as an output of a task (e.g., alignment of FASTQ files generates BAM files).
Missing image Figure 2. The Analyses tab showing a data node of unaligned reads
Task nodes (Figure 3) represent the analysis steps performed on the data within a project. For details on the tasks available in Partek Flow, see the specific chapters of this user manual dedicated to the different tasks.
Clicking on a node reveals the context sensitive menu, on the right side of the screen.
Only the tasks that are available for the selected data node will be listed in the menu. For data nodes, actions that can be performed on that specific data type will appear.
In Figure 4, a node that contains Unaligned reads is selected (bold black line). The tasks listed are the ones that can be performed on unaligned data (QA/QC, Pre-analysis tools, and Aligners).
To hide the context sensitive menu, simply click the symbol on the upper left corner of the context sensitive menu. Clicking the triangles will collapse ( ) or expand ( ) the different categories of tasks that are shown.
After a task is performed on a data node, a new task node is created and connected to the original data node. Depending on the task, a new data node may automatically be generated that contains the resulting data. For details of individual tasks, see Task Menu.
In Figure 5, alignment was performed on the unaligned reads. Two additional nodes were added: a task node for Align reads and an output data node containing the Aligned reads.
To run a task, select a data node and then locate the task you wish to perform from the context sensitive menu. Mouse over to see a description of the action to be performed. Click the specific task, set the additional parameters (Figure 6), and click Finish. The task will be scheduled, the display will refresh, and the screen will return to the project's Analyses tab.
In Figure 6, the STAR aligner was selected and the choices for the aligner index and additional alignment options appeared.
Tasks that are currently running (or scheduled in the queue) appear as translucent nodes. The progress of the task is indicated by the progress bar within the task node. Hovering the mouse pointer over the node will highlight the related nodes (with a thin black outline) and display the status of the task (Figure 7).
If a task is expected to generate data nodes, expected nodes appear even before the task is completed. They will have a lighter shade of color to indicate that they have not yet been generated as the task is still being performed. Once all tasks are done, all nodes would appear in the same shade.
Tasks can only be cancelled by the user that started the task or by the owner of the project. Running or pending tasks can be canceled by clicking the right mouse button on the task node and then selecting Cancel (Figure 8). Alternatively, the task node may be selected and the Cancel task selected from the context sensitive menu.
A verification dialog will appear (Figure 9) asking to confirm the task cancellation, the cancelled tasks will remain in the Analyses tab but will be flagged by gray x circles on the nodes (Figure 10).
Data nodes connected to incomplete tasks are also incomplete as no output can be generated (Figure 10).
To delete tasks from the project click the right mouse button on the task node and then select Delete (Figure 11). Alternatively, click the task node and select Delete task from the context sensitive menu.
A verification dialog will appear (Figure 12). A yellow warning sign will show up if there some downstream tasks performed by collaborators will be affected. Deleting the tasks output files optional. If this is not selected, the task nodes will disappear from the Analyses tab but the output files will remain in the project output directory.
Selecting a task node will reveal a menu pane with two sections: Task results and Task actions (Figure 13).
Items from the Task results section inform on the action performed in that node. Certain tasks generate a Task report (Figure 14), which include any tables or charts that the task may have produced.
The Task details shows detailed diagnostic information about the task (Figure 15). It includes the duration and parameters of the task, lists of input and output files, and the actual commands (in the command line) that were run.
Additionally, the Task details page would contain the error logs of unsuccessful runs. The user can download the logs or send them directly to Partek. This page plays an important role in diagnosing and troubleshooting issues related to task.
Double clicking on a task node will show the Task report page. However, if no report was generated, the user will be directed to the Task details page.
In the Task actions sections, the selected task can be Re-run w/new parameters, and in case it is part of a pipeline that includes additional tasks after it, running the Downstream tasks is an option. Re-running tasks will result in a new layer being made in the Analyses tab.
Another action available for a task node is Add task description (Figure 16), which is a way to add notes to the project. The user can enter a description, which will be displayed when the mouse pointer is hovered over the task node.
It is common for next-generation sequencing data analysis to examine different task parameters for optimization. Users may want to modify an upstream step (e.g. alignment parameters) and inspect its effect on downstream results (e.g. percent aligned reads).
The implementation of Layers makes optimizations easy and organized. Instead of creating separate nodes in a pipeline, another set of nodes with a different color is stacked on top of previous analyses (Figure 17). To see the parameters that were changed between runs, hover the mouse icon over the set of stacked task nodes and a pop-up balloon will display them. The text color signifies the layer corresponding to a specific parameter.
Layers are formed when the same task is performed on the same data node more than once. They are also formed when a task node is selected and the Re-run it w/new parameters is selected in the context sensitive menu. This will allow the users to change the options only for the selected task. The user may choose to re-run the task to which the changes have been made, as well as all the downstream tasks until the analysis is completed. To do so, select Re-run w/new parameters, downstream tasks from the context sensitive menu.
To select a different layer, use the left mouse button to click on any node of the desired layer. All the nodes associated with the selected layer have the same color and when clicked will be displayed on the top of the stack.
Addition of task and resulting data nodes to project may lead to creation of long pipelines, extending well beyond the border of the canvas (Figure 18).
In that case, the pipeline can be collapsed, to hide the steps that are (no longer) relevant. For example, once the single-cell RNA-seq data has been quantified, Single cell counts data node will be a new analysis start point, as the subsequent analyses will not focus on alignment, UMI deduplication etc. To start, right-click on the task node which should become a boundary of the collapsed portion of your pipeline and select Collapse tasks (Figure 19).
All the tasks on that layer will turn purple. Then left-click the task which should be the other boundary of the collapsed portion. All the tasks that will be collapsed will turn green and a dialog will appear (Figure 19). In the example shown in Figure 19, the tasks between Trim tags and Quantify barcodes will be collapsed. Give the collapsed section a name (up to 30 characters) and select Save (Figure 20).
The collapsed portion of the pipeline is replaced by single task node, with a custom label ("Single cell preprocessing"; Figure 21).
To re-expand the pipeline double click on the task node representing the collapsed portion of the pipeline. Alternatively, single click on the node and select Expand... on the context-sensitive menu. Within the same menu, you can also preview the contents of the collapsed task by selecting View... (Figure 22).
Data associated with any data node can be downloaded using the Download data link in the context sensitive menu (Figure 23). Compressed files will be downloaded to the local computer where the user is accessing the Partek Flow server. Note that bigger files (such as unaligned reads) would take longer to download. For guidance, a file size estimate is provided for each data node. These zipped files can easily be imported by the Partek® Genomics Suite® software.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This tutorial gives an overview of RNA-Seq analysis with Partek Flow. It will guide you through creating an RNA-Seq analysis pipeline. The goals of the analysis are to create a list of differentially expressed genes, visualize these gene expression signatures by hierarchical clustering, and interpret the gene lists using gene ontology (GO) enrichment.
This tutorial will illustrate:
This tutorial uses a subset of the data set published in Xu et al. 2013 (PMID: 23902433). In the experiment, mRNA was isolated from HT29 colon cancer cells treated with the drug 5-aza-deoxy-cytidine (5-aza) at three different doses: 0μM (control), 5μM, or 10μM. The mRNA was sequenced using Illumina HiSeq (paired end reads). The goal of the experiment was to identify differentially expressed genes between the different treatment groups.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
A project may be deleted from the Partek Flow server using the button on the upper right side of the Project View page (Figure 1).
Alternatively, you can also delete your projects directly from the Homepage by clicking Delete project under the Actions column (Figure 2).
After clicking the Delete project, a page displaying all the files associated with the project appears. Clicking the triangle will expand the list. Select the files to be deleted from the server by clicking the corresponding checkboxes next to each file (Figure 3). By default, all output files generated by the project will be deleted.
If you wish to delete the input files associated with the project, you can do that as well by clicking the Input files checkbox. Note that a warning icon appears next to input files that are used in other projects (Figure 4). These cannot be deleted until all projects associated with them are deleted.
Every project can be exported before it is removed from the server. By exporting old projects, you can free up some storage on your server. You can import the exported project back into Partek Flow later if needed.
Open a project and on the analysis page, click on the gear button and choose Export project (Figure 1). You can also export the project directly from Partek Flow home page by clicking the icon under the Action column (Figure 2).
When you export a project, you will be asked whether to include library files to export or not. If you choose Yes, the current version of library files used in the project will be archived and you can reproduce the result when you later import the project and re-do the analysis. However, it will make the archive size bigger. If you choose No, the library files will not be exported. Note that when you import the project later, you can only use the available version of needed library files to re-do the same analysis, and the results might not be the same.
The Import project option is under Projects drop-down menu on the top of the Partek Flow page (Figure 5). This can be accessed on any Partek Flow page.
The input of this option is the zipped file of the exported project. Browse to the file location which can either be the Partek Flow server, a local machine, or a URL. The zip file first needs to be uploaded to the Partek Flow server (if it is not on the server already), and then Partek Flow will unpack the zip file into a project. The project name will be the same as the exported project name. If the project with the same name already exists, the imported project will have a number appended to it (e.g., ProjectName_1).
The owner of the imported project will be the user that imported it.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
If a project is publicly available in the Gene Expression Omnibus (GEO) and European Nucleotide Archive (ENA) databases, you can import associated FASTQ files, sample attributes, and project details automatically into Partek Flow.
Click Projects at the top of the page
Click Import project
Choose GEO / ENA project for Select files from
Type the BioProject ID or the GEO Accession number
The format of a BioProject ID is PRJNA followed by one to six numbers (e.g., PRJNA291540). The format of a GEO Accession number is GSE followed by one to five numbers (e.g., GSE71578).
Click Import project at the bottom
The Analyses tab will include an Unaligned reads data node once the data download has started (Figure 3). It may take a while for the download to complete depending on the size of the data. FASTQ files are downloaded from the ENA BioProject page.
If the study is not publicly available in both GEO and ENA, project import will not succeed.
If there is an ENA project, but the FASTQ files are not available through ENA, the project will be created, but data will not be imported.
A variety of other issues and irregularities can cause imports to not succeed or partially succeed, including, but not limited to, a BioProject having multiple associated GSE IDs, incomplete information on the GEO or ENA page, and either the GEO or ENA project not being publicly available.
The Gene Expression Omnibus (GEO) and the European Nucleotide Archive (ENA) are web-accessible public repositories for genomic data and experiments. Access and learn more about their resources at their respective websites:
GEO - https://www.ncbi.nlm.nih.gov/geo/
ENA - https://www.ebi.ac.uk/ena
You can search ENA using the GEO ID (e.g., GSE71578) to check if there is a matching ENA project (Figure 6).
Attributes describe samples. Examples of sample attributes include treatment group, age, sex, and time point. Attributes can be added individually in the Metadata tab or in bulk using a text file. In this tutorial, we will add one attribute, 5-AZA Dose, manually.
Click the Metadata tab
Click Manage under Sample attributes (Figure 1)
Click Add new attribute (Figure 2)
To configure a new attribute, at Name, type in 5-AZA Dose as the name of the attribute
Click Add to add 5-AZA Dose as a categorical, project-specific attribute (Figure 3)
Name the first New category 0uM
Click the green plus icon to add category (Figure 4)
Repeat for two additional categories, 5uM and 10uM (Figure 5)
Click Back to metadata tab
The data table now includes an Attribute column for 5-AZA Dose (Figure 6). Next, we need to assign samples attribute categories for 5-AZA Dose.
Select Assign values
The option to edit the 5-AZA Dose field for each sample will appear as a drop-down menu (Figure 7).
Select the 5-AZA Dose text box for a sample to bring up a drop-down menu with the 5-AZA Dose attribute categories (0uM, 5uM, 10uM)
Use the drop-down menus to add a treatment group for each sample
The first three samples (SRR592573-5) should be 0uM, the next three samples (SRR592576-8) should be 5uM, and the final three samples (SRR592579-81) should be 10uM (Figure 8).
Click Apply changes
The data table will now show each a 5-AZA Dose attribute for each sample.
With attributes added, we can begin building our pipeline.
Click the Analyses tab
In the Analysis tab, data are represented as circles, termed data nodes. One data node, mRNA, should be visible in the Analysis tab (Figure 1).
Click the mRNA node
Clicking a data node brings up the context-sensitive task menu with tasks that can be performed on the data node (Figure 2).
Pre-alignment QA/QC assesses the quality of the unaligned reads and will help us determine whether trimming or filtering is necessary.
Click Pre-alignment QA/QC in the QA/QC section of the task menu
Click Finish to run the task with default settings
Running a task creates a task node, e.g. the blue rectangle labeled Pre-alignment QA/QC (Figure 3), which contains details on the task and a report.. While tasks have been queued or are in progress they have a lighter color. Any output nodes that the task will generate are also displayed in a lighter color until the task completes. Once the task begins running, a progress bar is displayed on the task node.
Click the Pre-alignment QA/QC node
The context-sensitive task menu (Figure 4) shows the option to view the Task report and the Task details. You can also access a task report by double-clicking on a task node.
Click Task report
Pre-aligment QA/QC provides information about the sequencing quality of unaligned reads (Figure 5). Both project level summaries and sample-level summaries are provided.
Click sample SSR592573 in the data table of the report to open its sample-level report
The Average base quality score per position graph in the upper right-hand panel (Figure 6) gives the average Phred score for each position in the reads.
A Phred score is a measure of base call accuracy with a higher score indicating greater accuracy.
By convention, a score above 20 is considered adequate. As you can see, the standard error bars in the graph show that some reads have quality scores below 20 for some of their base pair calls near the 3' end.
Based on the results of Pre-alignment QA/QC, while most of the reads are high quality, we will need to perform read trimming and filtering. For more information about the information included in the task report, please see the Pre-alignment QA/QC user guide.
Click RNA-Seq 5-AZA to return to the Analyses tab
The tutorial data set includes 9 samples equally divided into 3 treatment groups. Sequencing was performed by an Illumina HiSeq (paired-end reads), but the workflow can be easily adapted for data generated by other sequencers. Each sample has 2 fastq files for a total of 18 fastq files.
You can obtain the tutorial data set through Partek Flow.
Click your avatar
Click Settings in the drop-down menu (Figure 1)
At the top of the System information page, there is a section labeled Download tutorial data (Figure 2).
Click RNA-Seq 5-AZA to download the tutorial data set
A new project will be created and you will be directed to the Analyses Tab. The data will be downloaded automatically (Figure 3) and imported into your project. Because this is a tutorial project, there is no need to click on Add data as it will be done automatically.
At first the project is empty, but the file download will start automatically in the background. You can wait a few minutes then refresh your browser or you can monitor the download progress using the Queue.
Click Queue
Click View Queued Tasks in the drop-down menu
The Queued tasks page will open (Figure 4).
Click Projects
Click RNA-Seq 5-AZA in the drop-down menu
The Analyses tab will open (Figure 5). If you download has completed, you will see a blue circle titled mRNA.
Once the download completes, the sample table will appear in the Metadata tab.
Click the Metadata tab The Metadata tab includes the sample table with the names of each imported sample (Figure 6).
In the next section of the tutorial, we will add a sample attribute that indicates the treatment group of each sample.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Phred Quality Score | Probability of incorrect base call | Base call accuracy |
---|
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
20 | 1 in 100 | 99% |
30 | 1 in 1000 | 99.9% |
40 | 1 in 10,000 | 9.99% |
50 | 1 in 100,000 | 99.999% |
RNA-Seq uses the number of sequencing reads per gene or transcript to quantify gene expression. Once reads are aligned to a reference genome, we need to assign each read to a known transcript or gene to give a read-count per transcript or gene.
Click the Aligned reads data node
Click Quantification in the task menu
We will use Partek E/M to quantify reads to an annotation model in this tutorial. For more information about the other quantification options, please see the Quantification user guide.
Click Quantify to an annotation model (Partek E/M) (Figure 1)
We will use the default options for quantification. To learn more about the different options, please see the Quantify to annotation model (Partek E/M) user guide or mouse over the next to each option.
Choose the latest RefSeq Transcripts 95 annotation from the Gene/feature annotation drop-down menu (you may need to download it first, via Library File Management)
Click Finish (Figure 2)
The Quantify to annotation model task node outputs two data nodes, Gene counts and Transcript counts (Figure 3).
To view the results of quantification, we can select either data node output.
Double-click the Gene counts data node to view the task report
The task report details the number of reads within exons, introns, and intergenic regions. For detailed information about the quantification results, see the Quantify to annotation model (Partek E/M) user guide.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
After alignment has completed, we can view the quality of alignment by performing post-alignment QA/QC.
Click the Aligned reads data node
Click QA/QC inthe task menu
Click Post-alignment QA/QC from the QA/QC section of the task menu (Figure 1)
A Post-alignment QA/QC task node will be generated (Figure 2).
Double-click the Post-alignment QA/QC task node to view the task report
Similar to the Pre-alignment QA/QC task report, general quality information about the whole data set is displayed and sample-level reports can be opened by clicking a sample name in the table.
The top two graphs in the data set view (Figure 3) show the alignment breakdown and coverage.
From these graphs, we can see that more than 95% of reads were aligned, but the total number of reads for each sample varies. Normalizing for the variability in total read counts will be addressed in a later section of the tutorial.
For more information about the graphs and information presented in the Post-alignment QA/QC task report, see the Post-alignment QA/QC user guide.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
With our reads trimmed, we now have high-quality reads for each sample. The next step is to align the reads to a reference genome. Alignment matches each of the short sequencing reads to a location in the reference genome.
Click Trimmed reads
Click Aligners in the task menu to display available aligners (Figure 1)
Partek Flow offers a variety of different aligners. Mouse over any option for a short description. For this tutorial, we will use STAR, a fast and accurate aligner commonly used for RNA-Seq data. For more information about STAR and the other aligners, please consult the Aligners user guide.
Click STAR
The STAR aligner options allow us to select the genome build (assembly) and index. For this tutorial, our data set contains only reads that map to chromosome 22 to minimize the time required for resource-intensive tasks, such as alignment.
Click Finish to run with hg19 selected for Assembly and Whole genome for the Aligner index (Figure 2)
Alignment is a resource-intensive task and may take around 20 minutes to complete, even when mapping only reads from a single chromosome. Task and data nodes that have been queued, but not completed, are shown in a ligher color than completed tasks (Figure 3).
The Align reads task generates an Aligned reads data node once complete. You can wait for the alignment task to finish or you can continue building the pipeline while the results of alignment are pending; additional tasks can be added to the pipeline and queued before the current task has completed.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Based on pre-alignment QA/QC, we need to trim low quality bases from the 3' end of reads.
Click the Unaligned reads data node
Click Pre-alignment tools in the task menu
Click Trim bases (Figure 1)
By default, Trim bases removes bases starting at the 3' end and continuing until it finds a base pair call with a Phred score of equal to or greater than 35 (Figure 2).
Click Finish to run Trim bases with default settings
The Trim bases task will generate a new data node, Trimmed reads (Figure 3). We can view the task report for Trim bases by double-clicking either the Trim bases task node or the Trimmed reads data node or choosing Task report from the task menu.
Double-click the Trimmed reads data node to open the task report
The report shows the percentage of trimmed reads and reads removed in a spreadsheet and a two graphs (Figure 4).
The results are fairly consistent across samples with ~2% of reads untrimmed, ~86% trimmed, and ~12% removed for each. The average quality score for each sample is increased with higher average quality scores at the 3' ends.
Click RNA-Seq 5-AZA to return to the Analyses tab
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Because different samples have different total numbers of reads, it would be misleading to calculate differential expression by comparing read count numbers for genes across samples without normalizing for the total number of reads.
Click the Filtered counts data node
Click Normalization and scaling in the task menu
Click Normalization (Figure 1)
The Count normalization menu will open (Figure 2).
Normalization can be performed by sample or by feature. By sample is selected by default; this is appropriate for the tutorial data set.
Available normalization methods are listed in the left-hand panel. For more information about these options, please see the Normalize counts user guide.
For this tutorial, we will use the recommended default normalization settings.
Select
This adds the Median ratio normalization method, which is suitable for performing differential expression analysis using DESeq2 (Figure 3).
Click Finish to perform normalization
A Normalize counts task node and a Normalized counts data node are added to the pipeline (Figure 4)
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Low expression genes may be indistinguishable from noise and will decrease the sensitivity of differential expression analysis.
Click the Gene counts node
Click Filtering in the task menu
Click Filter features (Figure 1)
Click Noise reduction filter
Set the filter to maximum <= 10
Click Finish (Figure 2)
A new Filtered counts node will be created (Figure 3).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
In addition to the volcano plot showing all genes, we can view expression levels of each gene on a dot plot.
Double-click the **Filtered feature list **data node to open the task report
Click the FDR step up header in the 5uM vs. 0uM section to sort by ascending FDR step up
In the task report table, there is a column labeled View with three icons in each row.
Select to open a dot plot for the gene SELENOM
The dot plot for SELENOM (Figure 1) shows each sample as a point with normalized reads on the y-axis. Samples are separated and colored by treatment group.
The principal components analysis (PCA) scatter plot allows us to visualize similarities and differences between the samples in a data set.
Click the Normalized counts data node
Click Exploratory analysis in the task menu
Click PCA
Click Finish to run PCA with the default options
The PCA task node will be added to the pipeline (Figure 1)
Double click the PCA data node to open the PCA scatter plot (Figure 2)
In the Data Viewer, click Style under Configure and set the Color by drop-down to 5-AZA Dose. The scatter plot shows each sample as a sphere, colored by treatment group, in a three dimensional plot. The x, y, and z axes are the first three principal components. The percentage of total variance explained by each is listed next to the axis label. The size of each axis is determined by the variance along that axis. The plot is fully interactive; it can be rotated and points selected.
Here, we can see that samples separate based on treatment, but there is noticeable separation within treatment groups, particularly the 0μM and 10μM treatment groups.
From the DESeq2 task report, we can browse to any gene in the Chromosome view.
Click in the SELENOM row to open Chromosome view (Figure 1)
A new tab will open showing SELENOM in the Chromosome view (Figure 2).
Chromosome View shows reference genome, annotation, and data set information together aligned at genomic coordinates.
The top track shows average number of total count normalized reads for each of the three treatment groups in a stacked histogram. The second track shows the RefSeq annotation.
We can add tracks from any data node using Select Tracks.
Click Select tracks
A pop-up dialog showing the pipeline allows us to choose which data to display as tracks in Chromosome view (Figure 3).
Click Reads pileup under Aligned reads on the left-hand side of the dialog
Click Display tracks to make the change
The reads pileup track is now included (Figure 4).
Once we have performed DESeq2 to identify differentially expressed genes, we can create a list of significantly differentially expressed genes using cutoff thresholds.
Double click the Feature list data node to open the task report
The task report spreadsheet will open showing genes on rows and the results of the DESeq2 on columns (Figure 1).
To get a sense of what filtering thresholds to set, we can view a volcano plot for a comparison.
A volcano plot will open showing p-value on the y-axis and fold-change on the x-axis (Figure 2). If the gene labels are on (not shown), click on the plot to turn them off.
Thresholds for the cutoff lines are set using the Statistics card (Configuration panel > Configure > Statistics). The default thresholds are |2| for the X axis and 0.05 for the Y axis.
Switch to the browser tab showing the DESeq2 report
Click FDR step up
Click the triangle next to FDR step up to open the FDR step up options
Leave All contrasts selected
Set the cutoff value to 0.05. Hit Enter.
This will include genes that have a FDR step up value of less than or equal to 0.05 for all three contrasts, 5μM vs. 0μM, 10μM vs. 0μM and 5μM:10μM vs. 0μM. FDR step up is the false discovery rate adjusted p-value used by convention in microarray and next generation sequencing data sets in place of unadjusted p-value.
Click Fold-change
Click the triangle next to Fold-change to open the Fold-change options
Leave All contrasts selected
Set to From -2 to 2 with Exclude range selected. Hit Enter.
Note that the number of genes that pass the filter is listed at the top of the filter menu next to Results: and will update to reflect any changes to the filter. Here, 27 genes pass the filter (Figure 3). Depending on your settings, the number may be slightly different.
This creates a Filter list task node and a Filtered feature list data node (Figure 4).
After normalizing the data, we can perform differential analysis to identify genes that are differentially expressed based on treatment.
Click the Normalized counts node
Click Statistics in the task menu
Click Differential analysis in the task menu (Figure 1)
Check 5-AZA Dose and click Add factors to add the attribute to the statistical model.
Select Next to continue with 5-AZA Dose as the selected attribute
The Comparisons page will open (Figure 4).
It is easiest to think about comparisons as the questions we are asking. In this case, we want to know what are the differentially expressed genes between untreated and treated cells. We can ask this for each dose individually and for both collectively.
The upper box will be the numerator and the lower box will be the denominator in the comparison calculation so we will select the 0μM control in the lower box.
Drag 5μM to the upper box
Drag 0μM to the lower box
Click Add comparison to add 5μM vs. 0μM to the comparison table (Figure 5)
Repeat to create comparisons for 10μM vs. 0μM and 10μM,5μM vs. 0μM (Figure 6)
Click Finish to perform DESeq2 as configured
A DESeq2 task node and a DESeq2 data node will be added to the pipeline (Figure 7).
To check how well our list of differentially expressed genes distinguishes one treatment group from another, we can perform hierarchical clustering based on the gene list. Clustering can also be used to discover novel groups within your data set, identify gene expression signatures distinguishing groups of samples, and to identify genes with similar patterns of gene expression across samples.
Click the Feature list data node
Click Exploratory analysis in the task menu
Click Hierarchical clustering (Figure 1)
The Hierarchical clustering menu will open (Figure 2). Hierarchical clustering can be performed with a heatmap or bubble map plot. Cluster must be selected under Ordering for both Feature order and Cell order if both the features (columns) and cells (rows) are to be clustered.
Click Finish to run with default settings
A Hierarchical clustering task node will be added to the pipeline (Figure 3).
Double-click the Hierarchical clustering / heatmap task node to launch the heatmap
The Dendrogram view will open showing a heatmap with the hierarchical clustering results (Figure 4).
Samples are shown on rows and genes on columns. Clustering for samples and genes is shown through the dendrogram trees. More similar samples/genes are separated by fewer branch points of the dendrogram tree.
The heatmap displays standardized expression values with a mean of zero and standard deviation of one.
The heatmap can be customized to improve data visualization using the menu on the Configuration panel on the left.
Expand the Annotations > Data card.
In the dialog, click on the Gene counts node
Now set the Row annot to 5-AZA Dose
Samples are now labeled with their 5-AZA Dose group (Figure 5).
Samples cluster based on treatment group and the 5μM and 10μM groups are more similar to each other than to the 0μM group.
We can save the heatmap as a publication-quality image.
Choose size and resolution using the Save as SVG dialog (Figure 6)
Select Save
The heatmap will be saved as a .svg file and downloaded in your web browser.
For more information about the dot plot, please see the user guide. To return to the DESeq2 report, switch to the browser table with filtered feature list.
In the DESeq2 report, you can select to view additional information about the statistical results for a gene or select to view the region in Chromosome View. Chromosome View is discussed in the next section of the tutorial.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
For more detailed information about the PCA scatter plot, please see the user guide.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Each track has Configure track and Move track buttons that can be used to modify each track.
To learn more about Chromosome view, please consult the user guide.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Click next to the 5uM vs. 0uM comparison
Click to create a data node with only the genes that pass the filter
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select the appropriate differential analysis method (Figure 2). In this tutorial we are going to use DESeq2, but Partek Flow offers a number of alternatives. Hover the mouse over the symbol for more information on each differential analysis method, or see our user guide for a more in-depth look.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Click on the gray button on the right hand side of Row annot None available
Click the Export image icon in the top right corner of the plot
For more information about hierarchical clustering and the Dendrogram view, please see the user guide.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
This tutorial presents an outline of the basic series of steps for analyzing a 10x Genomics Gene Expression with Feature Barcoding (antibody) data set in Partek Flow starting with the output of Cell Ranger.
If you have Cell Hashing data, please see our documentation on Hashtag demultiplexing.
This tutorial includes only one sample, but the same steps will be followed when analyzing multiple samples. For notes on a few aspects specific to a multi-sample analysis, please see our Single Cell RNA-Seq Analysis (Multiple Samples) tutorial.
If you are new to Partek Flow, please see Getting Started with Your Partek Flow Hosted Trial for information about data transfer and import and Creating and Analyzing a Project for information about the Partek Flow user interface.
The data set for this tutorial is a demonstration data set from 10x Genomics. The sample includes cells from a dissociated Extranodal Marginal Zone B-Cell Tumor (MALT: Mucosa-Associated Lymphoid Tissue) stained with BioLegend TotalSeq-B antibodies. We are starting with the Feature / cell matrix HDF5 (filtered) produced by Cell Ranger. Prior to beginning, transfer this file to your Partek Flow using the Transfer files button on the homepage.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
By following the steps in this tutorial, you have built a pipeline. You can save this pipeline for future use.
Select Create new pipeline near the bottom left-hand side of the browser window
Select the Pre-alignment QA/QC, Tr****im bases, Align reads, Post-alignment QA/QC, Quantify to annotation model, Filter features, Normalize counts, PCA, and GSA task nodes to include them in the pipeline
Name the pipeline; we have chosen RNA-Seq basic analysis
Give a description for the pipeline; we have noted trim <20, STAR, normalize with total count and add 0.001, GSA
Select Create pipeline (Figure 1)
To access this pipeline in the future, select an unaligned reads data node and choose Pipelines from the task menu. Available saved pipelines will be available to choose from the Pipelines section of the task menu (Figure 2).
After selecting the pipeline, you will be prompted to choose the reference genome for alignment, the annotation for quantification, and the contrasts for GSA. After selecting these options, the pipeline will automatically run. See Pipelines for more information.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
To learn more about the biology underlying gene expression changes, we can use gene ontology (GO) or pathway enrichment analysis. Enrichment analysis identifies over-represented GO terms or pathways in a filtered list of genes.
Click the filtered Filtered feature list data node
Click Biological interpretation in the task menu
Click Gene set enrichment then select Gene set database to perform GO enrichment analysis (Figure 1)
Select the latest gene set from geneontology.org from the Gene set database drop-down menu
Click Finish
A GO e_nrichment_ task node will be added to the pipeline (Figure 2).
Double-click the Gene set enrichment task node to open the task report (Figure 3)
The GO e_nrichment_ task report spreadsheet lists GO terms by ascending p-value with the most significant GO term at the top of the list. Also included are the enrichment score, the number of genes from that GO term in the list, and the number of genes from that GO term that are not in the list.
To view the genes associated with each GO term, select to open the extra details page. To view additional information about a GO term, click the blue gene set ID to open the linked geneontology.org entry in a new tab.
For more information about GO enrichment analysis, please see the Gene Set Enrichment user guide.
KEGG enrichment analysis identifies pathways that are over-represented in a gene list data node.
Click the filtered Filtered feature list data node
Click Biological interpretation in the task menu
Click Gene set enrichment then select KEGG database
Click Finish in the configuration dialog to run KEGG analysis with the Homo sapiens KEGG database
A Pathway e_nrichment_ task node will be added to the pipeline (Figure 4).
Double-click the Pathway enrichment task node to open the task report
The Pathway enrichment task report is similar to the Enrichment analysis task report (Figure 5).
To view an interactive KEGG pathway map, click the pathway ID (Gene set column).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Let's start by creating a new project.
On the Home page, click New project (Figure 1)
Give the project a name
Click Create project
In the Analyses tab, click Add data
Click 10x Genomics Cell Ranger counts h5 (Figure 2)
Choose the filtered HDF5 file for the MALT sample produced by Cell Ranger
Move the .h5 file to where Partek Flow is installed using , then browse to its location.
Note that Partek Flow also supports the feature-barcode matrix output (barcodes.tsv, features.tsv, matrix.mtx) from Cell Ranger. The import steps for a feature-barcode matrix are identical to this tutorial.
Click Next
Name the sample MALT (the default is the file name)
Specify the annotation used for the gene expression data (here, we choose Homo sapiens (human) - hg38 and Ensembl Transcripts release 109). If Ensembl 109 is not available from the drop-down list, choose Add annotation and download it.
Check Features with non-zero values across all samples in the Report section
Click Finish (Figure 3)
A Single cell counts data node will be created under the Analyses tab after the file has been imported. We can move on to processing the data.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Single cell counts data node contains two different types of data, mRNA expression and protein expression. So that we can process these two different types of data separately, we will split the data by data type.
Click the Single cell counts data node
Click Pre-analysis tools in the toolbox
Click Split by feature type
A rectangular task node will be created along with two circular data nodes, one for each data type (Figure 1). The labels for these data types are determined by features.csv file used when processing the data with Cell Ranger. Here, our data is labeled Gene Expression, for the mRNA data, and Antibody Capture, for the protein data.
An important step in analyzing single cell RNA-Seq data is to filter out low-quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few counts to be analyzed. In a CITE-Seq experiment, protein aggregation in the antibody staining reagents can cause a cell to have a very high number of counts. These are low-quality cells that can be excluded. Additionally, if all cells in a data set are expected to show a baseline level of expression for one of the antibodies used, it may be appropriate to filter out cells with very low counts or a low number of detected features. You can do this in Partek Flow using the Single cell QA/QC task.
We will start with the protein data.
Click the Antibody Capture data node
Click QA/QC in the toolbox
Click Single Cell QA/QC
This produces a Single-cell QA/QC task node (Figure 2).
Double-click the Single cell QA/QC task node to open the task report
The Single cell QA/QC report opens in a new data viewer session. There are interactive violin plots showing the most commonly used quality metrics for each cell: the total count per cell and the number of detected features per cell (Figure 3). Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric.
For this analysis, we will set a maximum counts threshold to exclude potential protein aggregates and, because we expect every cell to be bound by several antibodies, we will also set a minimum counts threshold.
Select one of the plots on the canvas
In the Select & Filter icon on the left under Tools, set the Counts threshold to keep cells between 500 and 20000 (Figure 4)
Click under Filter on the right
Click Apply observation filter...
Select the Antibody Capture data node as input in the pipeline preview (Figure 5)
Click Select
You will see a message telling you a new task has been enqueued.
Click OK to dismiss the message
Click the project name at the top to go back to the Analyses tab
Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab
A new task, Filter counts, is added to the Analyses tab. This task produces a new Filter counts data node.
Next, we can repeat this process for the Gene Expression data node.
Click the Gene Expression data node
Click the QA/QC section in the toolbox
Click Single Cell QA/QC
This produces a Single-cell QA/QC task node
Double-click the Single cell QA/QC task node to open the task report
The task report lists the number of counts per cell, the number of detected features per cell, the percentage of mitochondrial reads per cell, and the percentage of ribosomal counts per cell in four violin plots (Figure 6). For this analysis, we will set maximum and minimum thresholds for total counts and detected genes to exclude potential doublets and a maximum mitochondrial reads percentage filter to exclude potential dead or dying cells. There is no need to apply a filter based on the percentage of ribosomal counts in this tutorial.
In the Selection card on the right, set the Counts threshold to keep cells between 1500 and 15000
Set the Detected features to keep cells between 400 and 4000
Set the % Mitochondrial counts to keep cells between 0% and 20% (Figure 6)
Click under Filter on the right
Click Apply observations filter
Select the Gene Expression data node as input in the pipeline preview
Click Select
Click OK to dismiss the message about the task being enqueued
Click the project name at the top to go back to the Analyses tab
Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab
A new task, Filter counts, is added to the Analyses tab. This task produces a new Filter counts data node (Figure 7)
After excluding low-quality cells, we can normalize the data.
We will start with the protein data.
Click the Filtered counts data node produced by filtering the Antibody Capture data node
Click Normalization and scaling in the toolbox
Click Normalization
Click Finish to run (Figure 8)
The recommended normalization for protein data includes the following steps: Add 1, Divide by Geometric mean, Add 1, Log base 2. This is a variant of Centered log-ratio (CLR), which was used to normalize antibody capture protein counts data in the paper that introduced CITE-Seq [1] and in subsequent publications on similar assays [2. 3]. CLR normalization includes the following steps: Add 1, Divide by Geometric mean, Add 1, log base e. Normalizing the protein data to base 2 instead of e allows for better integration with gene expression data further downstream. If you would prefer to use CLR, click and drag CLR from the panel on the left to the right. If you do choose to use CLR, we recommend making sure the gene expression data is normalized to the base e, to allow for smoother integration further downstream.
Normalization produces a Normalized counts data node on the Antibody Capture branch of the pipeline.
Next, we can normalize the mRNA data. We will use the recommended normalization method in Partek Flow, which accounts for differences in library size, or the total number of UMI counts, per cell, adds 1 and log2 transforms the data.
Click the Filtered counts data node produced by filtering the Gene Expression data node
Click the Normalization and scaling section in the toolbox
Click Normalization
Click the button
Click Finish to run (Figure 9)
Normalization produces a Normalized counts data node on the Gene Expression branch of the pipeline (Figure 10).
For quality filtering and normalization, we needed to have the two data types separate as the processing steps were distinct. For downstream analysis, we want to be able to analyze protein and mRNA data together. To bring the two data types back together, we will merge the two normalized counts data nodes.
Click the Normalized counts data node on the Antibody Capture branch of the pipeline
Click Pre-analysis tools in the toolbox
Click Merge matrices
Click Select data node to launch the data node selector
Data nodes that can be merged with the Antibody Capture branch Normalized counts data node are shown in color (Figure 11).
Click the Normalized counts data node on the Gene Expression branch of the pipeline (Figure 11)
Click Select
Click Finish to run the task
The output is a Merged counts data node (Figure 12). This data node will include the normalized counts of our protein and mRNA data. The intersection of cells from the two input data nodes is retained so only cells that passed the quality filter for both protein and mRNA data will be included in the Merged counts data node.
To simplify the appearance of the pipeline, we can group task nodes into a single collapsed task. Here, we will collapse the filtering and normalization steps.
Right-click the Split by feature type task node
Choose Collapse tasks from the pop-up dialog (Figure 13)
Tasks that can be selected for the beginning and end of the collapsed section of the pipeline are highlighted in purple (Figure 14). We have chosen the Split matrix task as the start and we can choose Merge matrices as the end of the collapsed section.
Click the Merge matrices task to choose it as the end of the collapsed section
Name the Collapsed task Data processing
Click Save (Figure 15)
The new collapsed task, Data processing, appears as a single rectangular task node (Figure 16).
To view the tasks in Data processing, we can expand the collapsed task.
Double-click Data processing to expand it or right-click and choose Expand collapsed task
When expanded, the collapsed task is shown as a shaded section of the pipeline with a title bar (Figure 17).
To re-collapse the task, you can double click the title bar or click the icon in the title bar. To remove the collapsed task, you can click the . Please note that this will not remove tasks, just the grouping.
Double-click the Data processing title bar to re-collapse
[1] Stoeckius, M., Hafemeister, C., Stephenson, W., Houck-Loomis, B., Chattopadhyay, P. K., Swerdlow, H., ... & Smibert, P. (2017). Simultaneous epitope and transcriptome measurement in single cells. Nature methods, 14(9), 865.
[2] Stoeckius, M., Zheng, S., Houck-Loomis, B., Hao, S., Yeung, B. Z., Mauck, W. M., ... & Satija, R. (2018). Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome biology, 19(1), 224.
[3] Mimitou, E., Cheng, A., Montalbano, A., Hao, S., Stoeckius, M., Legut, M., ... & Satija, R. (2018). Expanding the CITE-seq tool-kit: Detection of proteins, transcriptomes, clonotypes and CRISPR perturbations with multiplexing, in a single assay. bioRxiv, 466466.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This tutorial presents an outline of the basic series of steps for analyzing a single cell RNA-Seq experiment in Partek Flow starting with the count matrix file.
This tutorial includes only one sample, but the same steps will be followed when analyzing multiple samples. For notes on a few aspects specific to a multi-sample analysis, please see our Single Cell RNA-Seq Analysis (Multiple Samples) tutorial.
If you are new to Partek Flow, please see Getting Started with Your Partek Flow Hosted Trial for information about data transfer and import and Creating and Analyzing a Project for information about the Partek Flow user interface.
An important step in analyzing single cell RNA-Seq data is to filter out low quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed. You can do this in Partek Flow using the Single cell QA/QC task.
Click on the Single cell data node
Click on the QA/QC section of the task menu
Click on Single cell QA/QC
A task node, Single cell QA/QC, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Single cell QA/QC task node to indicate that the task is running (Figure 1).
Click the Single cell QA/QC node once it finishes running
Double-click the Task report in the task menu
The Single cell QA/QC report includes interactive violin plots showing the value of every cell in the project on several quality measures (Figure 2).
There can be four plots: number of read counts per cell, number of detected genes per cell, the percentage of mitochondrial reads per cell, and the percentage of ribosomal counts.
Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric. Cells can be filtered either with the plot controls by using the selection tools on the right of the plot (rectangle mode , ellipse mode , or lasso mode ) and selecting a region on one of the plots or by setting thresholds using the Select & Filter tool. Here, we will apply a filter for the number of read counts.
The plots will be shaded to reflect the selection. Cells that are excluded will be shown as dim dots on all plots.
The read counts per cell and number of detected genes per cell are typically used to filter out potential doublets - if a cell as an unusually high number of total counts or detected genes, it may be a doublet. The mitochondrial reads percentage can be used to identify cells damaged during cell isolation - if a cell has a high percentage of mitochondrial counts, it is likely damaged or dying and may need to be excluded.
Open the Select & Filter icon in the left panel. The histograms can be pinned while fine tuning the selections (Figure 2). Set the filters to represent the majority of the population (violin width)
Click the filter icon and Apply observation filter then select the Single cell counts data node to run the Filter cells task on the first Single cell counts data node, it generates a Filtered counts task node that generates a Filtered cells results node
Use Save as to give this Data Viewer session a new name (e.g. QA/QC filter) so you can return to this filter at any time and see the exact criteria that has been selected and filtered.
A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes (features). Because there is no gold standard for what makes a gene informative or not and ideal gene filtering criteria depends on your experimental design and research question, Partek Flow has a wide variety of flexible filtering options. The Filter features step can also be performed before normalization or after normalization.
Click the data node containing count matrix
Click Filtering in the task menu
Click Filter features
There are four categories of filter available - noise reduction, statistics based, feature metadata, and feature list.
The noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The statistics based filter is useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The feature list filter allows you to filter your data set to include or exclude particular genes.
For example, you can use a noise reduction filter to exclude genes that are not expressed by any cell in the data set, but were included in the matrix file.
Click the Noise reduction filter check box
Set the Noise reduction filter to Exclude features where value <= 0 in at least 99.9% of cells using the drop-down menus and text boxes
Click Finish to apply the filter (Figure 3)
This results node, Filtered counts, will be the starting point for the next stage of analysis.
Because different cells will have a different number of total counts, it is important to normalize the data prior to downstream analysis. For droplet-based single cell isolation and library preparation methods that use a 3' counting strategy, where only the 3' end of each transcript is captured and sequenced, we recommend the following normalization - 1. CPM (counts per million), 2. Add 1, 3. Log2. This accounts for differences in total UMI counts per cell and log transforms the data, which makes the data easier to visualize.
Click the Filtered cells results node produced by the Filtered counts task
Click Normalization and scaling in the context-sensitive task menu on the right
Click Normalization
Click to add the recommended normalization scheme
This adds CPM (counts per million), Add 1, and Log2 to the Normalization order panel. Normalization steps are performed in descending order.
Click Finish to apply the normalization (Figure 4 )
A new Normalized counts data node will be produced. You can choose to change the color of this node by right-clicking on the task node then clicking Change color and/or rename the result node by right-clicking and selecting Rename data node.
In the example below, I have changed the color to dark blue and renamed the results node based on the scheme.
For more information on normalizing data in Partek Flow, please see the Normalize Counts section of the user manual.
Principal components (PC) analysis (PCA) is an exploratory technique that is used to describe the structure of high dimensional data by reducing its dimensionality. Because PCA is used to reduce the dimensionality of the data prior to clustering as part of a standard single cell analysis workflow, it is useful to examine the results of PCA for your data set prior to clustering.
Click the Filtered counts node
Click Exploratory analysis in the task menu
Click PCA from the drop-down list
You can choose Features contribute equally to standardize the genes prior to PCA or allow more variable genes to have a larger effect on the PCA by choosing by variance. By default, we take variance into account and focus on the most variable genes.
If you have multiple samples, you can choose to run PCA for each sample individually or for all samples together by selecting or not selecting the Split by sample option (Figure 5).
Click Finish to run
A new PCA task node will be produced.
Double-click the PCA task node to open the 3D PCA scatter plot in data viewer (Figure 6)
Beside PCA coordinates of the cells, PCA task report also includes, the Scree plot, the component loadings table, and the PC projections table.
The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering, UMAP and t-SNE.
To draw a Scree plot, in Data viewer, choose Scree plot icon available in New plot under Setup on the left panel , choose the PCA data node (Figure 7)
Note that Partek Flow suggests appropriate data for each plot type that is chosen so only PCA results will be available to select from for the Scree plot.
Mouse over the Scree plot to identify the point where additional PCs offer little additional information
In this data set, a reasonable cut-off could be set anywhere between 7 and 20 PCs.
Viewing the genes correlated with each PC can be useful when choosing how many PCs to include.
Click the Table option in the New plot icon under Setup and select the PCA data node to open the Component loadings table (Figure 8)
This table lists genes on rows and PCs on columns, the value in this table is correlation coefficient r. The table can be downloaded as a text file by clicking on the Export table data icon on the upper-right corner of the plot.
To display PCA projects table, click on the Table drop-down list in the Content icon under Configure and choose PCA projections (Figure 9)
PCA projections table contains each row as an observation (a cell in this case), each column represents one principal component (Figure 10). This table can be downloaded as text file, the same way as the component loading table.
Graph-based clustering identifies groups of similar cells using PC values as the input. By including only the most informative PCs, noise in the data set is excluded, improving the results of clustering.
Click the PCA data node
Click Exploratory analysis in the task menu
Click Graph-based clustering
Clustering can be performed on each sample individually or on all samples together. Here, we are working with a single sample.
Check Compute biomarkers to compute features that are highly expressed when comparing each cluster (Figure 11)
Click Configure to access the Advanced options and change the Number of nearest neighbors to 50 and Nearest Neighbor Type to K-NN for this example tutorial.
The Number of principal components should be set based on the your examination of the Scree plot and component loadings table. The default value of 100 is likely exhaustive for most data sets, but may introduce noise that reduces the number of clusters that can be distinguished.
Click Finish to run the task
A new Graph-based clusters data and Biomarkers data node will be generated along with the task nodes.
Double-click the Graph-based clusters node to see the cluster results and statistics (left screenshot on Figure 12)
Double-click the Biomarkers node to see the computed biomarkers if you have selected this option (right screenshot on Figure 12)
The Graph-based clustering result lists the Total number of clusters and what proportion of cells fall into each cluster as well as Maximum modularity which is a measurement of the quality of the clustering result where optimal modularity is 1. The Biomarkers node includes the top features for each graph-based cluster. It displays the top-10 genes that distinguish each cluster from the others. Download at the bottom right of the table can be used to view and save more features. These are calculated using an ANOVA test comparing the cells in each group to all the other cells, filtering to genes that are 1.5 fold upregulated, and sorting by ascending p-value. This ensures that the top-10 genes of each cluster are highly and disproportionately expressed in that cluster.
We will use t-SNE to visualize the results of Graph-based clustering.
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensional reduction technique that prioritizes local relationships to build a low-dimensional representation of the high-dimensional data that places objects that are similar in high-dimensional space close together in the low-dimensional representation. This makes t-SNE well suited for analyzing high-dimensional data when the goal is to identify groups of similar objects, such as cell types in single cell RNA-Seq data.
Click the Graph-based clusters node
Click Exploratory analysis in the task menu
Click t-SNE
If you have multiple samples, you can choose to run t-SNE for each sample individually or for all samples together using the Split cells by sample option. Please note that this option will not be present if you are running t-SNE on a clustering result. For clarity, clustering results run with all samples together must be viewed together and clustering results run by sample must be viewed by sample.
Like Graph-based clustering, t-SNE takes PC values as its input and further reduces the data down to two or three dimensions. For consistency, you should use the same number of PCs as the input for t-SNE that you used for Graph-based clustering.
Click Apply
Click Finish to run (Figure 13)
A new t-SNE task node will be produced.
Double-click the t-SNE node to open the t-SNE task report (Figure 14). Use the panel on the left to modify the plot or add more plots to this Data viewer session.
The t-SNE scatter plot is interactive and can be viewed for 2D or 3D. The t-SNE plot is 3D by default. You can rotate the 3D plot by left-clicking and dragging your mouse or using Control under Configure. You can zoom in and out using your mouse wheel. You can pan by right-clicking and dragging your mouse. You can use Style to modify color, shape, size, and labeling (e.g. add a fog effect to improve depth perception on the plot). Add a 2D plot clicking New plot, selecting 2D Scatter plot and selecting t-SNE as the source of the data.
Click on the plot to ensure that the plot window is selected. Click Style under Configure to color the t-SNE.
Color by the options in the drop-down menu under Color. You should be on the normalized counts node which can be seen by hovering over or clicking the circle (node) to the right of the drop-down.
Click the text field in the drop-down and start typing CD79A then select the gene by clicking on it (Figure 15)
The cells on the plot will be colored based on their expression level of CD79A (Figure 16). In the example in Figure 16, the Style icon has been dragged to a different location on the screen and the legend has also been resized and moved. Resizing the legend can either be done on the legend itself or using the Description icon under Configure.
Coloring by one gene uses the two-color numeric palette, which can be customized by clicking . To color by more than one gene use the Numeric triad option in the drop-down. If you color by more than one gene, the color palette switches to a Green-Red-Blue color scheme with the balance between the three color channels determined by the values of the three genes. For example, a cell that expresses all three genes would be white, a cell that expresses the first two genes would be yellow, and a cell that expresses none of the genes would be black (Figure 17).
Clicking a cell on the plot shows the expression values of the cell in the legend. Hovering over a cell on the plot also shows this information and related details (Figure 18).
If you want to color by more than three genes at time, such as by a list of genes that distinguish a particular cell type, you can use the color by Feature list option.
Select Feature List from the Color by drop-down
Choose Cytotoxic cells from the List drop-down (use List management in Settings to add lists to Partek Flow which will automatically make them available here)
Choose PCA from the Metric drop-down
Coloring by a list, in this way, calculates the first three principal components for the gene list and colors the cells on the plot by their values along those three PCs with green for PC1, red for PC2, and blue for PC3 (Figure 19).
Typically, the expression of a set of marker genes will be highly correlated, allowing the first PC to account for a large percentage of the variance between cells for that gene list. As a result, the group of cells characterized by their expression of the genes on the list will separate from the rest of the cells along PC1 and will be colored green (Figure 16). If the gene list is more complex, for example, including marker genes for multiple cell types, there may be several sets of correlated genes accounting for significant amounts of variance, leading to groups of cells being distinguishable along PC2 and PC3 as well. In that case, there may be green, blue, and red groups of cells on the plot. If the gene list does not distinguish any group of cells, all cells will have similar PC values, leading to similarly colored cells on the plot.
In addition to coloring by gene expression and by gene lists, the points can be colored by any cell or sample attribute. Available attributes are listed as options in the Color by drop-down menu. Note that any available options are dependent upon the selected data node. In the following section we will use the attribute Graph-based to color our cells by the clusters identified in the Graph-based clustering task (Figure 20).
The most basic way to select a point on the scatter plot is to click it with the mouse while in pointer mode. To select multiple cells, you can hold Ctrl on your keyboard and click the cells. To select larger groups of cells, you can switch to Lasso mode by clicking in the plot controls on the right hand side. The lasso lets you freely draw a shape to select a cluster of cells.
Click to activate Lasso mode
Left-click and hold to draw a lasso around a cluster of cells
Release and click the starting circle to close the lasso and select the enclosed cells (Figure 21)
You can also create a lasso with straight lines using Lasso mode by clicking, releasing, and clicking again to draw a shape.
By default, selected cells are shown in bold while unselected cells are dimmed (Figure 22). This can be changed to gray selected cells using the Select & Filter tool in the left panel as shown in Figure 22.
Double-click any blank section of the scatter plot to clear the selection
Alternatively, you can select cells using any criteria available for the data node that is selected in the Select & Filter tool. To change the data selection click the circle (node) and select the data.
Choose Graph-based from the Criteria drop-down menu in the Select & Filter tool after ensuring you on are on the Graph-based cluster node by hovering on the circle (Figure 23). If you are not on the correct node, you need to click the circle and select the data.
This adds check boxes for each level of the attribute (i.e., clusters). Click a check box to select the cells with that attribute level.
Click only 2 and 3
This selects cells from Graph-based clusters 2 and 3 (Figure 24). The number of selected cells is listed in the Legend on the plot.
Cells can also be selected based on their gene expression values in the Select & Filter section.
Click the circle and select the Normalized counts node which has gene expression data
Type cd3d in the text field of the drop-down
Click on CD3D to add it as criteria to select from and use the slider or text field to adjust the selected values. Pin the histogram to visualize the distribution during selection.
Very specific selections can be configured by adding criteria in this way. In the example below, Clusters 2 and 3 and high CD3D expression is selected (Figure 25).
Once a cell has been selected on the plot, it can be filtered. The filter controls can exclude or include (only) any selected cell. Filtering can be particularly useful when you want to use a gene expression threshold to classify a group of cells, but the gene in question is not exclusively expressed by your cell type of interest.
In this example we can filter to include just cells from the selection we have already made.
Click (filter include) to filter to just the selected cells (Figure 26).
The plot will update to show only the included cells as seen in Figure 26.
Cells that are not shown on the plot cannot be selected, allowing you to focus on the visible cells. The number of cells shown on the plot out of the total number of original cells is shown in the Legend. You can adjust the view to focus on only the included cells.
Click on the plot controls or toggle on Fit visible in the Axes configuration to rescale the axes to the filtered points
To revert to the original scaling, click the button again or turn off Fit visible with the toggle.
Alternatively, to exclude selected cells, click (filter exclude) (Figure 27)
Additional inclusion or exclusion filters can be added to focus on a smaller subset of cells.
Click Clear filters to remove applied filters
The plot will update to show all cells and return to the original scaling.
Classifying cells allows to you assign cells to groups that can be used in downstream analysis and visualizations. Commonly, this is used to describe cell types, such as B cells and T cells, but can be used to describe any group of cells that you want to consider together in your analysis, such as cycling cells or CD14 high expressing cells. Each cell can only belong to one class at a time so you cannot create overlapping classes.
To classify a cell, just select it then click Classify selection in the Classify tool.
For example, we can classify a cluster of cells expressing high levels of CD79A as B cells.
Set Color by in the Style configuration to the normalized counts node
Type CD79A in the search box and select it. Rotate the 3D plot if you need to see this cluster more clearly.
Click to activate Lasso mode
Draw a lasso around the cluster of CD79A-expressing cells (Figure 28)
Because most of these cells express CD79A, a B cell marker, and because they cluster together on the t-SNE, suggesting they have similar overall gene expression, we believe that all these cells are B cells.
Click Classify under Tools in the left panel
Type B cells for the Name
Click Save (Figure 29)
You can edit the name of a classification or delete it. In this project we use the hosted feature lists for "NK cells", "T cells" and "Monocytes" to classify these cell types by coloring the cells in the t-SNE plot and selecting the cells expressing those genes as shown above. See the list management documentation for more information on how to add these lists. The classifications you have made are saved as a working draft so if you close the plot and return to it, the classifications will still be there and can be visualized on the plot as "New classification". However, classifications are not available for downstream tasks until you apply them. Continue classifying the clusters and save the Data viewer session until you are ready to apply the classification to the data project.
Color by New classifications under Style (Figure 30) while you are still working on the classifications
To use the classifications in downstream tasks and visualizations, you must first apply them.
Click Apply classifications
Name the classification (e.g. Classified Cell Types)
Click Run to confirm
Once you have added a classification to the project, you can color the t-SNE plot by the Classification.
Here, I classified a few additional cell types using a combination of known marker genes and the clustering results then applied the classification (Figure 31).
Summarize Classifications with the number and percentage of cells from each sample that belong to each classification using an Attribute table under New plot. This is particularly useful when you are classifying cells from multiple samples.
Click New plot
Select Attribute table and the source of data (Figure 32) which in this case is called Classify result
Click on the Normalized counts" node
Navigate to the Compute biomarkers task under Statistics in the task menu
Follow the task dialogue and click Finish (Figure 33)
Double click the Biomarkers node to view the Biomarkers results
A common goal in single cell analysis is to identify genes that distinguish a cell type. To do this, you can use the differential analysis tools in Partek Flow. I will show how to use the ANOVA test in Partek Flow, a statistical test shown to be highly effective for differential analysis of single cell RNA-Seq data.
Click the Normalized counts results node
Click Statistics in the toolbox
Click Differential Analysis
Select ANOVA as the M_ethod to use for differential analysis_
The first page of the configuration dialog asks what attributes you want to include in the statistical test. Here, we only want to consider the Classifications, but in a more complex experiment, you could also include experimental conditions or other sample attributes.
Click Classified Cell Types
Click Next (Figure 34)
We will make a comparison between NK cells and all the other cell types to identify genes that distinguish NK cells. You can also use this tool to identify genes that differ between two cell types or genes that differ in the same cell type between experimental conditions.
Drag NK cells to the top panel
The top panel is the numerator for fold-change calculations so the experimental or test groups should be selected in the top panel.
Click all the other classifications in the bottom panel
The bottom panel is the denominator for fold-change calculations so the control group should be selected in the bottom panel.
Click Add comparison
This adds the comparison to the statistical test.
Click Finish to run the ANOVA task (Figure 35)
Double-click the newly generated data node to open the ANOVA task report
The ANOVA task report lists genes on rows and the results of the statistical test (p-value, fold change, etc.) on columns (Figure 36). For more information, please see our documentation page on the ANOVA task report.
Genes are listed in ascending order by the p-value of the first comparison so the most significant gene is listed first. To view a volcano plot for any comparison, click . To view a violin plot for a gene, click next to the Gene ID.
Click for CCL4
The Feature plot viewer will open showing a dot plot for CCL4 which can be modified to summarize the data in different ways (Figure 37). In the image below, the red boxes highlight the changes that were made to configure the plot. This includes overlaying the violins (density plots with the width corresponding to frequency) on the dot plot represented by the Classified Cell Types.
You can switch the grouping of cells. To do this, show the X axis labels then click and drag the labels to reposition the cell types on the plot.
Click ANOVA report to return to the table
The table lists all of genes in the data set; using the filter control panel on the left, we can filter to just the genes that are significantly different for the comparison.
Click FDR step up and click the arrow next to it
Set to 1e-8
Here, we are using a very stringent cutoff to focus only on genes that are specific to NK cells, but other applications may require a less stringent cutoff.
Click Fold change and click the arrow next to it
Set to -2 to 2
The number of genes at the top of the filter control panel updates to indicate how many genes are left after the filters are applied.
Click to generate a filtered version of the table for downstream analysis
The ANOVA report will close and a new task, the Differential analysis filter, will run and generate a filtered Feature list data node.
For more information about the ANOVA task, please see the Differential Gene Expression - ANOVA section of our user manual.
Once we have filtered to a list of significantly different genes, we can visualize these genes by generating a heatmap.
Click the Filtered feature list data node produced by the Differential analysis filter
Click Exploratory analysis in the toolbox
Click Hierarchical clustering / heatmap
The hierarchical clustering task will generate the heatmap; choose Heatmap as the plot type. You can choose to Cluster features (genes) and cells (samples) under Feature order and Cell order in the Ordering section. You will almost always want to cluster features as this generates the clear blocks of color that make heatmaps comprehensible. For single cell data sets, you may choose to forgo clustering the cells in favor of ordering them by the attribute of interest. Here, we will not filter the cells, but instead order them by their classification.
Click Assign order under Cell order
You can filter samples using the Filtering section of the configuration dialog. Here, we will not filter out any samples or cells.
Choose Classification from the Ordering drop-down menu
Drag NK cells to the top of the Sample order
Click Finish to run (Figure 38)
Double-click the Hierarchical cluster task node to open the task report
It may initially be hard to distinguish striking differences in the heatmap. This is common in single cell RNA-Seq data because outlier cells will skew the high and low ends. We can adjust the minimum and maximum of the color scheme to improve the appearance of the heatmap.
Click Heatmap
Toggle on the Range Min and set to -2
Toggle on the Range Max and set to 2
Distinct blocks of red and blue are now more pronounced on the plot. Cells are on rows and genes are on columns. Because of the limited number of pixels on the screen, genes are grouped. You can zoom in using the zoom controls or your mouse wheel if you want to view individual gene rows. We can annotate the plot with cell attributes.
Choose Classified Cell Types from the Annotations drop-down menu
Change the Annotation font size under Style in the Annotations section
The plot now includes blocks of color along the left edge indicating the classification of the cells. We can transpose the plot to give the cell labels a bit more space.
Click Transposed under Axes or use the transpose button on the plot to flip the axes
Toggle off the Row labels under Axes to remove the sample labels
As with any visualization in Partek Flow, the image can be saved as a publication-quality image to your local machine by clicking or sent to a page in the project notebook by clicking . For more information about Hierarchical clustering, please see the Hierarchical Clustering section of the user manual.
While a long list of significantly different genes is important information about a cell type, it can be difficult to identify what the biological consequences of these changes might be just by looking at the genes one at a time. Using enrichment analysis, you can identify gene sets and pathways that are over-represented in a list of significant genes, providing clues to the biological meaning of your results.
Click the Feature list data node produced by the Differential analysis filter
Click Biological interpretation
Click Gene set enrichment
We distribute the gene sets from the Gene Ontology Consortium, but Gene set enrichment can work with any custom or public gene set database.
Choose the latest assembly available from the Gene set drop-down
Click Finish
Double-click the Gene set enrichment task node to open the task report
The Gene set enrichment task report lists gene sets on rows with an enrichment score and p-value for each. It also lists how many genes in the gene set were in the input gene list and how many were not (Figure 40). Clicking the Gene set ID links to the geneontology.org page for the gene set.
In Partek Flow, you can also check for enrichment of KEGG pathways using the Pathway enrichment task. The task is quite similar to the Gene set enrichment task, but uses KEGG pathways as the gene sets.
The task report is similar to the Gene set enrichment task report with enrichment scores, p-values, and the number of genes in and not in the list (Figure 41).
Clicking the KEGG pathway ID in the Pathway enrichment task report opens a KEGG pathway map (Figure 42). The KEGG pathway maps have fold-change and p-value information from the input gene list overlaid on the map, adding a layer of additional information about whether the pathway was upregulated or downregulated in the comparison.
Color are customizable using the control panel on the left and the plot is interactive. Mousing over gene boxes gives the genes accounted for by the box, with genes present in the input list shown in bold, and the coloring gene shown in red (Figure 43).
Clicking a pathway box opens the map of that pathway, providing an easy way to explore related gene networks.
For information about automating steps in this analysis workflow, please see our documentation page on Making a Pipeline.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Next, we will perform some exploratory analysis on the merged mRNA and protein expression data and visualize the data in preparation to identify cell populations. Because the merged count matrix has thousands of features, it is a good idea to reduce the dimensionality of the data for more efficient downstream processing.
Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click PCA
Click Finish to run the PCA with default settings (Figure 1)
A PCA task node will be added to the pipeline under the Analyses tab and a circular PCA output data node will be produced (Figure 2).
Once the task completes, we will inspect the results to decide the optimal number of principal components (PCs) to use in downstream analyses. To do this, we will use a Scree plot.
Double click the PCA data node to open the task report
The PCA plot will open in a new data viewer session. A 3D scatterplot will be displayed on the canvas (Figure 3).
Click and drag the Scree plot from New plot under Setup on the left onto the canvas
Drop it over the Replace option (Figure 4)
Select PCA as data for the new Scree plot (Figure 5)
The Scree plot (Figure 6) shows the eigenvalues on the y-axis for each of the 100 PCs on the x-axis. The higher the eigenvalue, the more variance explained by each PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional components is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering and UMAP.
Click and drag over the first set of PCs to zoom in (Figure 7)
Mouse over the Scree plot to identify the point where additional PCs offer little additional information (Figure 8)
In this data set, a reasonable cut-off could be set anywhere between around 10 and 30 PCs. We will use 15 in downstream steps.
We can use Graph-based clustering to group similar cells together in an unsupervised manner.
Click the project name near the top to go back to the Analyses tab
Click the circular PCA data node
Click Exploratory analysis in the toolbox
Click Graph-based clustering
Click to Compute biomarkers
Set the number of principal components to 15 (Figure 9)
Click Configure under Advanced options and change the Resolution to 1.0
Click Finish to run the task
A Graph-based clustering task node will be added to the pipeline under the Analyses tab and a circular Graph-based clusters output data node will be produced (Figure 10)
Once the graph-based clustering task has completed, we can visualize the results with a UMAP plot. You could use the same steps here to generate a t-SNE plot. For this tutorial, we will use UMAP, as it is faster on several thousand cells.
Click the circular PCA data node
Click Exploratory analysis in the toolbox
Click UMAP
Set the number of principal components to 15 (Figure 11)
Click Finish to run the task
A UMAP task node will be added to the pipeline under the Analyses tab and a circular UMAP output data node will be produced (Figure 12)
In this tutorial, we have performed exploratory analysis on merged protein and gene expression data, and we will perform classification on the merged data in the next step.
It can be interesting to perform exploratory analysis on the two feature types separately. For example, you might be interested to see how the clustering of the same cells differs between protein expression profiles vs. gene expression profiles.
To perform exploratory analysis on the two feature types separately, select the Merged counts data node, click Pre-analysis tools, followed by Split by feature type from the toolbox. A new task, Split by feature type, will be added to the pipeline resulting in two output data nodes: Antibody capture (protein data) and Gene expression (mRNA data). Both contain the same high-quality cells.
Performing exploratory analysis with gene expression data is the same as for the merged counts. Because there are a large number of genes, you will need to reduce the dimensionality with PCA, choose an optimal number of PCs and perform downstream clustering and visualization (e.g. graph-based clustering and UMAP/t-SNE). Performing exploratory analysis with protein data is different. There is no need to reduce the dimensionality as there are only a handful of features (17 proteins in this case), so you can proceed straight to downstream clustering and visualization. Figure 13 shows an example of how the pipeline might look if the data is split and analyzed separately.
You can then use the Data viewer to bring together multiple plots for comparison (Figure 14).
Next, we will filter out certain cells and re-split the data. Re-splitting the data can be useful if you want to perform differential analysis and downstream analysis separately for proteins and genes. For your own analyses, re-splitting the data is optional. You could just as well continue with differential analysis with the merged data if you prefer.
Because we have classified our cells, we can now filter based on those classifications. This can be used to focus on a single cell type for re-clustering and sub-classification or to exclude cells that are not of interest for downstream analysis.
Click the Merged counts data node
Click Filtering
Click Filter cells
Set to exclude Cell type is Doublets using the drop-down menus
Click OR
Set the second filter to exclude Cell type is N/A using the drop-down menus
Click Finish to apply the filter (Figure 1)
This produces a Filtered counts data node (Figure 2).
Click the Filtered counts data node
Click Pre-analysis tools
Click Split by feature type
This will produce two data nodes, one for each data type (Figure 3). The split data nodes will both retain cell classification information.
Once we have classified our cells, we can use this information to perform comparisons between cell types or between experimental groups for a cell type. In this project, we only have a single sample, so we will compare cell types.
Click the Antibody Capture data node
Click Statistics
Click Differential analysis
Click ANOVA then click Next
The first step is to choose which attributes we want to consider in the statistical test.
Click Cell type
Click Add factor
Click Next
Next, we will set up the comparison we want to make. Here, we will compare the Activated and Mature B cells.
Drag Activated B cells in the top panel
Drag Mature B cells in the bottom panel
Click Add comparison
The comparison should appear in the table as Activated B cells vs. Mature B cells.
Click Finish to run the statistical test (Figure 4)
The ANOVA task produces an ANOVA data node.
Double-click the ANOVA data node to open the task report
The report lists each feature tested, giving p-value, false discovery rate adjusted p-value (FDR step up), and fold change values for each comparison (Figure 5).
In addition to the listed information, we can access dot and violin plots for each gene or protein from this table.
This opens a dot plot in a new data viewer session, showing CD45A expression for cells in each of the classifications (Figure 6). First, we exclude Doublets and N/A cells from the plot:
Open Select and filter, select Criteria
Drag "Cell type" from the legend title to the Add criteria box
Uncheck Doublets and N/A
Click to include selected points
We can use the Configuration panel on the left to edit this plot.
Open the Style icon
Switch on Violins under Summary
Switch on Overlay under Summary
Switch on Colored under Summary
Select the Graph-based clustering node in the Color by section
Color by Graph-based clusters under Color and use the slider to decrease the Opacity
Open the Axes icon
Select the Graph-based clustering node in the X axis section
Change the X axis data to Graph-based clusters
Use the slider to increase the Jitter on the X axis (Figure 7)
Click the project name to return to the Analyses tab
To visualize all of the proteins at the same time, we can make a hierarchical clustering heat map.
Click the ANOVA data node
Click Exploratory analysis in the toolbox
Click Hierarchical clustering/heatmap
In the Cell order section, choose Graph-based clusters from the Assign order drop-down list
Click Finish to run with the other default settings
Double-click the Hierarchical clustering task node to open the heatmap
The heatmap can easily be customized using the tools on the left.
Open the Axes icon
Switch off Show Row labels
Increase the Font to 16 (Figure 8)
Activate the Transpose switch which will switch the Row and Column labels, so now the Row labels will be shown (Figure 9)
Open the Dendrograms icon
Choose Row color By cluster and change Row clusters to 4
Change Row dendrogram size to 80 (Figure 10)
In the Heatmap icon
Navigate to Range under Color
Set the Min and Max to -1.2 and 1.2, respectively
Change the Shape to Circle (Figure 11)
Switch the Shape back to Rectangle
Change the Color Palette by clicking on the color squares and selecting colors from the rainbow. Click outside of the selection box to exit this selection. The color options can be dragged alone the Palette to highlight value differences (Figure 12).
Feel free to explore the other tool options on the left to customize the plot further.
We can use a similar approach to analyze the gene expression data.
Click the project name to return to the Analyses tab
Click the Gene Expression data node
Click the Antibody Capture data node
Click Statistics
Click Differential analysis
Click ANOVA then click Next
Click Cell type
Click Add factor
Click Next
Drag Activated B cells in the top panel
Drag Mature B cells in the bottom panel
Click Add comparison
The comparison should appear in the table as Activated B cells vs. Mature B cells.
Click Finish to run the statistical test
As before, this will generate an ANOVA task node and n ANOVA data node.
Double-click the ANOVA task node to open the task report (Figure 13)
Because more than 20,000 genes have been analyzed, it is useful to use a volcano plot to get an idea about the overall changes.
The Volcano plot opens in a new data viewer session, in a new tab in the web browser. It shows each gene as a point with cutoff lines set for P-value (y-axis) and fold-change (x-axis). By default, the P-value cutoff is set to 0.05 and the fold-change cutoff is set at |2| (Figure 14).
Click the ANOVA report tab in your web browser to return to the full report
We can filter the full set of genes to include only the significantly different genes using the filter panel on the left.
Click FDR step up
Type 0.05 for the cutoff and press Enter on your keyboard
Click Fold change
Set to From -2 to 2 and press Enter on your keyboard
The number at the top of the filter will update to show the number of included genes (Figure 15).
A task, Differential analysis filter, will run and generate a new Filtered Feature list data node. We can get a better idea about the biology underlying these gene expression changes using gene set or pathway enrichment. Note, you need to have the Pathway toolkit enabled to perform the next steps.
Click the Filtered feature list data node
Click Biological interpretation in the toolbox
Click Pathway enrichment
Make sure that Homo sapiens is selected in the Species drop-down menu
Click Finish to run
Double-click the Pathway enrichment task node to open the task report
The pathway enrichment results list KEGG pathways, giving an enrichment score and p-value for each (Figure 16).
To get a better idea about the changes in each enriched pathway, we can view an interactive KEGG pathway map.
Click path:hsa05202 in the Transcriptional misregulation in cancer row
The KEGG pathway map shows up-regulated genes from the input list in red and down-regulated genes from the input list in green (Figure 17).
Click the Normalized counts data node
Expand the Exploratory analysis section of the task menu
Click PCA
In this tutorial we will modify the PCA task parameters, to not split by sample, to keep the cells from both samples on the PCA output.
Uncheck (de-select) the Split by sample checkbox under Grouping
Click Finish
Double-click the circular PCA node to view the results
From this PCA node, further exploratory tasks can be performed (e.g. t-SNE, UMAP, and Graph-based clustering).
Choose Style under Configure
Color by and search for fasn by typing the name
Select FASN from the drop-down
The colors can be customized by selecting the color palette then using the color drop-downs as shown below.
Ensure the colors are distinguishable such as in the image above using a blue and green scale for Maximum and Minimum, respectively.
Click FASN in the legend to make it draggable (pale green background) and continue to drag and drop FASN to Add criteria within the Select & Filter Tool
Hover over the slider to see the distribution of FASN expression
Multiple gene thresholds can be used in this type of classification by performing this step with multiple markers.
Drag the slider to select the population of cells expressing high FASN (the cutoff here is 10 or the middle of the distribution).
Click Classify under Tools
Click Classify selection
Give the classification a name "FASN high"
Under the Select & Filter tool, choose Filter to exclude the selected cells
Exit all Tools and Configure options
Click the "X" in the right corner
Use the rectangle selection mode on the PCA to select all of the points on the image
This results in 147538 cells selected.
Open Classify
Click Classify selection and name this population of cells "FASN low"
Click Apply classifications and give the classification a name "FASN expression"
Now we will be able to use this classification in downstream applications (e.g. differential analysis).
We will now examine the results of our exploratory analysis and use a combination of techniques to classify different subsets of T and B cells in the MALT sample.
Double click the merged UMAP data node
Under Configure on the left, click Style, select the Graph-based cluster node, and color by the Graph-based attribute (Figure 1)
The 3D UMAP plot opens in a new data viewer session (Figure 2). Each point is a different cell and they are clustered based on how similar their expression profiles are across proteins and genes. Because a graph-based clustering task was performed upstream, a biomarker table is also displayed under the plot. This table lists the proteins and genes that are most highly expressed in each graph-based cluster. The graph-based clustering found 11 clusters, so there are 11 columns in the biomarker table.
Click and drag the 2D scatter plot icon from New plot onto the canvas (Figure 2)
Drop the 2D scatter plot to the right of the UMAP plot
Click Merged counts to use as data for the 2D scatter plot (Figure 3)
A 2D scatter plot has been added to the right of the UMAP plot. The points in the 2D scatter plot are the same cells as in the UMAP, but they are positioned along the x- and y-axes according to their expression level for two protein markers: CD3_TotalSeqB and CD4_TotalSeqB, respectively (Figure 4).
In Select & Filter, click Criteria to change the selection mode
Click the blue circle next to the Add rule drop-down menu (Figure 5)
Click Merged counts to change the data source
Choose CD3_TotalSeqB from the drop-down list (Figure 6)
Click and drag the slider on the CD3D_TotalSeqB selection rule to include the CD3 positive cells (Figure 7)
As you move the slider up and down, the corresponding points on both plots will dynamically update. The cells with a high expression for the CD3 protein marker (a marker for T cells) are highlighted and the deselected points are dimmed (Figure 8).
Click Merged counts in Get data on the left under Setup
Click and drag CD8a_TotalSeqB onto the 2D scatter plot (Figure 9)
Drop CD8_TotalSeqB onto the x-axis configuration option
The CD3 positive cells are still selected, but now you can see how they separate into CD4 and CD8 positive populations (Figure 10).
The simplest way to classifying cell types is to look for the expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than with gene expression data alone as the protein expression data has a better dynamic range and is less sparse. Additionally, many cell types have expected cell surface marker profiles established using other technologies such as flow cytometry or CyTOF. Let's compare the resolution power of the CD4 and CD8A gene expression markers compared to their protein counterparts.
Click the duplicate plot icon above the 2D scatter plot (Figure 11)
Click Merged counts in the Get Data icon under Setup
Search for the CD4 gene
Click and drag CD4 onto the duplicated 2D scatter plot
Drop the CD4 gene onto the y-axis option
Search for the CD8A gene
Click and drag CD8A onto the duplicated 2D scatter plot
Drop the CD8A gene onto the x-axis option
The second 2D scatter plot has the CD8A and CD4 mRNA markers on the x- and y-axis, respectively (Figure 12). The protein expression data has a better dynamic range than the gene expression data, making it easier to identify sub-populations.
Manually select the cells with high expression of the CD4_TotalSeqB protein marker (Figure 13)
More than 2000 cells show positive expression for the CD4 cell surface protein.
Let's perform the same test on the gene expression data.
Click on a blank spot on the plot to clear the selection
Manually select the cells with high expression of the CD4 gene marker (Figure 14)
This time, only 500 cells show positive expression for the CD4 marker gene. This means that the protein data is less sparse (i.e. there fewer zero counts), which further helps to reliably detect sub-populations.
Based on the exploratory analysis above, most of the CD3 positive cells are in the group of cells in the right side of the UMAP plot. This is likely to be a group of T cells. We will now examine this group in more detail to identify T cell sub-populations.
Draw a lasso around the group of putative T cells (Figure 15)
Click and drag the plot to rotate it around
This group of putative T cells predominantly consists of cells assigned to graph-based clusters 3, 4, and 6, indicated by the colors. Examining the biomarker table for these clusters can help us infer different types of T cell.
Click and drag the bar between the UMAP plot and the biomarker table to resize the biomarker table to see more of it (Figure 17)
Cluster 6 has several interesting biomarkers. The top biomarker is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Another biomarker is the PD-1 protein, which is expressed in Tfh cells. This protein promotes self-tolerance and is a target for immunotherapy drugs. The TIGIT protein is also expressed in cluster 6 and is another immunotherapy drug target that promotes self-tolerance.
Cluster 4 expresses several marker genes associated with cytotoxicity (e.g. NKG7 and GZMA) and both CD3 and CD8 proteins. Thus, these are likely to be cytotoxic cells.
We can visually confirm these expression patterns and assess the specificity of these markers by coloring the cells on the UMAP plot based on their expression of these markers.
Click the duplicate plot icon above the UMAP plot
We will color the cells on the duplicate by their expression of marker genes, while keeping the original plot colored by graph-based cluster assignment.
Click and drag the CXCL13 gene from the biomarker table onto the duplicate UMAP plot
Drop the CXCL13 gene onto the Green (feature) option (Figure 18)
Click and drag the NKG7 gene from the biomarker table onto the duplicate UMAP plot
Drop the NKG7 gene onto the Red (feature) option
The cells with higher CXCL13 and NKG7 expression are now colored green and red, respectively. By looking at the two UMAP plots side by side, you can see these two marker genes are localized in graph-based clusters 6 and 4, respectively (Figure 19).
Click the blue circle next to the Add criteria drop-down list
Search for Graph to search for a data source
Select Graph-based clustering (derived from the Merged counts > PCA data nodes)
Click the Add criteria drop-down list and choose Graph-based to add a selection rule (Figure 20)
In the Graph-based filtering rule, click All to deselect all cells
Click cluster 6 to select all cells in cluster 6
Using the Classify tool, click Classify selection
Label the cells as Tfh cells (Figure 21)
Click Save
Click cluster 4 to select all cells in cluster 4
In the Classify icon, click Classify selection
Label the cells as Cytotoxic cells
Click Save
We can classify the remaining cells as helper T cells, as they predominantly express the CD4 protein marker.
Click on the invert selection icon in either of the UMAP plots (Figure 22)
In Classify, click Classify selection
Label the cells as Helper T cells
Click Save
Let's look at our progress so far, before we classify subsets of B-cells.
Click the Clear filters link in Select & Filter
Select the duplicate UMAP plot (with the cell colored by marker genes)
Under Configure on the left, open Style and color the cells by New classifications (Figure 23)
In addition to T-cells, we would expect to see B lymphocytes, at least some of which are malignant, in a MALT tumor sample. We can color the plot by expression of a B cell marker to locate these cells on the UMAP plot.
In the Get data icon on the left, click Merged counts
Scroll down or use the search bar to find the CD19_TotalSeqB protein marker
Click and drag the CD19_TotalSeqB marker over to the UMAP plot on the right
Drop the CD19_TotalSeqB marker over the Color configuration option on the plot
The cells in the UMAP plot are now colored from grey to blue according to their expression level for the CD19 protein marker (Figure 24). The CD19 positive cells correspond to several graph-based clusters. We can filter to these cells to examine them more closely,
Lasso around the CD19 positive cells (Figure 25)
The plots will rescale to include the selected points. The CD19 positive cells include cells from graph-based clusters 1, 2 and 7 (Figure 26).
Find the CD3_TotalSeqB protein marker in the biomarker table
Click and drag the CD3_TotalSeqB onto the UMAP plot on the right
Drop the CD3_TotalSeqB protein marker onto the Color configuration option on the plot (Figure 27)
While these cells express T cell markers, they also group closely with other putative B cells and express B cell markers (CD19). Therefore, these cells are likely to be doublets.
Select either of the UMAP plots
Click on the Select & Filter
Find the CD3_TotalSeqB protein marker in the biomarker table
Click and drag CD3_TotalSeqB onto the Add criteria drop-down list in Select & Filter (Figure 28)
Set the minimum threshold to 3 in the CD3_TotalSeqB selection (Figure 29)
Click the Classify icon then click Classify selection
Label the cells as Doublets
Click Save
The biomarkers for clusters 1 and 2 also show an interesting pattern. Cluster 1 lists IGHD as its top biomarker, while cluster 2 lists IGHA1 as the fourth most significant. Both IGHD (Immunoglobulin Heavy Constant Delta) and IGHA1 (Immunoglobulin Heavy Constant Alpha 1) encode classes of the immunoglobulin heavy chain constant region. IGHD is part of IgD, which is expressed by mature B cells, and IGHA1 is part of IgA1, which is expressed by activated B cells. We can color the plot by both of these genes to visualize their expression.
Click, drag and drop IGHD from the biomarker table onto the Green (feature) configuration option on the UMAP plot on the right
Click, drag and drop IGHA1 from the biomarker table onto the Red (feature) configuration option on the UMAP plot on the right (Figure 30)
We can use the lasso tool to select and classify these populations.
Lasso around the IGHD positive cells (Figure 31)
In the Classify icon on the left, click Classify selection
Label the cells as Mature B cells
Click Save
Lasso around the IGHA1 positive cells (Figure 32)
In the Classify icon on the left, click Classify selection
Label the cells as Activated B cells
Click Save
We can now visualize our classifications.
Click the Clear filters link in the Select & Filter icon on the left
Select the duplicate UMAP plot (with the cell colored by marker genes)
Under Configure on the left, click the Style icon and color the cells by New classifications (Figure 33)
Click Apply classifications in the Classify icon
Name the attribute Cell type
Click Run
Click OK to close the message about a classification task being enqueued
Here we are .
A basic example of a spatial data analysis, starting from the Single cell counts node, is shown below and is similar to with the addition of the Spatial report task (shown) or Annotate Visium image task (not shown).
A context-sensitive menu will appear on the right side of the pipeline. Use the drop-downs in the toolbox to open available tasks for the selected data node.
Remove gene expression counts that are not relevant to the analysis.
Click the Filtering drop-down in the toolbox
Click the Filter Features task
Choose Noise reduction
Exclude features where value <= 0.0 in at least 99.0% of the cells
Click Finish
Remove gene expression values that are zero in the majority of the cells.
A task node, Filtered counts, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Filter features task node to indicate that the task is running.
Normalize (transform) the cells to account for variability between cells.
Select the Filtered Counts result node
Choose the Normalization task from the toolbox
Click Use recommended
Click Finish
Explore the data by dimension reduction and clustering methods.
Click the Normalized counts result node
Select the PCA task under Exploratory analysis in the toolbox
Unselect Split by Sample
Click Finish
The PCA result node generated by the PCA task can be visualized by double-clicking the circular node.
Single click the PCA result node
Select the Graph-based clustering task from the toolbox
Click Finish
The results of graph-based clustering can be viewed by PCA, UMAP, or t-SNE. Follow the steps outlined below to generate a UMAP.
Select the Graph-based clustering result node by single click
Select the UMAP task from the toolbox
Click Finish
Double-click the UMAP result node
The UMAP is automatically colored by the graph-based clustering result in the previous node. To change the color, click Style.
Click the Filtered counts node
From the Classification drop-down in the toolbox, select Classify cell type
Using the Managed classifiers, select the human Intestine Garnett classifier
Click Finish
The output of this task produces the Classify result node.
Double-click the Classify result node to view the cell count for each cell type and the top marker features for each cell type.
Publish cell attributes to the project to make this attribute accessible for downstream applications.
Click the Classify result node
Select Publish cell attributes to project under Annotation/Metadata
Name the cell attribute
Click Finish
Publish cell attributes can be applied to result nodes with cell annotation (e.g. click the graph-based clustering result node and follow the same steps).
An example of this completed task is shown below.
Since this attribute has been published, we can choose to right-click the Publish cell attributes to project node and remove this from the pipeline. This attribute will be managed in the Metadata tab (discussed below).
The name of the Cell attribute can be changed in the Metadata tab (right of the Analyses tab).
Click Manage
Click the Action dots
Choose Modify attribute
Rename the attribute Cell Type
Click Save
Click Back to metadata tab
Drag and drop the categories to rearrange the order of these categories, The order here will determine the plotted order and legend in visualizations.
We can use these Cell attributes in analyses tasks such as Statistics (e.g. differential analysis comparisons) as well as to Style the visualizations in the Data Viewer.
Follow along to add files to your Xenium project: .
Filter the data including control probes using the Filter features task
Choose Feature metadata filter
Include the Gene Expression features
Click Finish
This results in a Filter features task (rectangle) and node (circle) results.
Right click the circle to Rename data node to "Filtered to only gene expression"
Click the circular "Filtered to only gene expression" results node and select the Single cell QA/QC task from the context sensitive menu on the right
When the task completes it will be opaque and no longer transparent with a progress bar
Select the "Filtered cells node" and choose the Normalization task from the Normalization and scaling drop-down in the task menu
Click the Use recommended button to proceed with these settings
Click Finish
This results in a Normalized counts node as shown below in the pipeline.
The fastq files are not pre-processed. The steps covered here will show you how to import and pre-process of the Visium Spatial Gene Expression data with brightfield and fluorescence microscope images.
The sample used for this tutorial can be found in the . We will use the .
Choose the 10x Genomics Visium fastq import format
Click Next
Select the fastq files in the upload folder used for file transfer (select all sample files at one time; including R1 and R2 for each sample)
Click Finish
The prefix used for R1 and R2 fastq files should match; one sample is shown in this example.
The fastq files will be imported into the project as an Unaligned reads node.
From the unaligned reads node, select Space Ranger from the 10x Genomics drop-down in the toolbox.
Specify the type of 10x Visium assay; this tutorial uses the Visium CytAssist gene expression library as the assay type
If you have not done so already, a Cell Ranger reference should be created
Specify the Reference assembly
Select the Image and Probe set files that have already been transferred to the server for all samples
Choose visium-2-large as the Slide parameter because this Visium CystAssist sample used a 11 x 11 slide capture area
Click Finish
The Space Ranger task output results in a Single cell counts node.
The tissue image must be annotated to associate the microscopy image with the expression data.
Click the newly created Single cell counts data node
Click the Annotation/Metadata section in the toolbox
Click Annotate Visium image
Click on the Browse button to open the file browser and point to the file _spatial.zip, created by the Space Ranger task
Click Finish
Select the zipped image folder for each sample. The image zip file should contain 6 files including image files and tissue position text file with a scale factor json file. The setup page shows the sample table (one sample per row).
You can find the location of the _spatial.zip file using the following steps. Select the Space Ranger task node (i.e. the rectangle) and then click on the Task Details (toolbox). Click on the Output files link to open the page with the list of files created by the Space Ranger task. Mouse over any of the files to see the directory in which the file is located. The figure below shows the path to the .zip file which is required for Annotate Visium image.
Mousing over a file on the Output files page shows a balloon with the file location.
A new data node, Annotated counts, will be generated.
The Annotated counts node is Split by sample. This means that any tasks performed from this node will also be split by sample. Invoke tasks from the Single cell counts node to combine samples for analyses.
Annotate Visium image task creates a new node, Annotated counts. Double click on the Annotated counts node to invoke the Data Viewer showing data points overlaid on top of the microscopy image.
Data Viewer session as a result of opening an Annotated counts data node. Each data point is a tissue spot.
Space Ranger output files are pre-processed 10x Genomics Visium data. The steps covered here will show you how to import and continue analyses with this pre-processed data from the Space Ranger pipeline. Partek Flow refers to this high cellular resolution data as Single cell counts; each point (spot) can be 1-10 cell resolution depending on the tissue type*.
The project includes and output files in one project.
Obtain the filtered Count matrix files (h5 or HDF5) files and Spatial outputs for each sample
The spatial imaging outputs should be in compressed format.
Navigate the options to select 10x Genomics Visium Space Ranger output as the file format for input
Proceed to transfer files as shown below using the 10x Genomics Visium Space Ranger outputs importer.
Navigate to the appropriate files for each sample. Please note that the 10x Genomics Space Ranger output can be count matrix data as 1 filtered .h5 file per sample or sparse matrix files for each sample as 3 files (two .csv with one .mtx or two .tsv with one .mtx for each sample). The spatial output files should be in compressed format (.zip). The high resolution image can be uploaded and is optional.
Count matrix files and spatial outputs should be included for each sample. Once added, the Cells and Features values will update.
Once the download completes, the sample table will appear in the Metadata tab, with one row per sample.
For this tutorial, we do not need to edit or change any sample attributes.
Click Analyses to switch to the Analyses tab
For now, the Analyses tab has a starting node, a circular node called Single cell counts and also a rectangular task node called Spatial report which was automatically generated for this type of data. As you , forming a visual representation of your analysis pipeline.
Click the Spatial report node
Click Task report on the task menu
The spatial report will display the first sample (Replicate 1). We want to visualize all of the samples using the steps below.
Duplicate the plot by clicking the Duplicate plot button in the upper right controls (arrow 1)
Open the Axes configuration option (arrow 2)
Change the Sample on the duplicated image under Misc (arrow 3)
Each data point is a tissue spot. Duplicate and change the sample to view multiple samples.
Double click on the Annotated counts node to invoke the Data Viewer showing data points overlaid on top of the microscopy image
Follow the steps outlined above by duplicating the image to visualize the multiple samples
To modify the points on the image to show more of the background image use the Style configuration option.
Press and hold Ctrl or Shift to select both plots
Click Style in the left panel
Move the Opacity slider to the left
Change the Point size to 3
Click Save in the left panel and give the session an appropriate name
Modify the axes to remove the X and Y coordinates from the tissue image.
Press and hold Ctrl or Shift to select both plots
Click Axes in the left panel
Toggle off Show lines for both the X & Y axis
Toggle off Show title and Show axis for both the X & Y axis
Style the image and color by normalized gene expression using three genes of interest.
Press and hold Ctrl or Shift to select both plots
Click Style
Select the Normalized counts node as the source
Choose to Color by Numeric triad
Use the Green drop-down to select IL32, Red drop-down to select DES, and Blue drop-down to select PTGDS genes (type in name of gene in drop-down)
Increase the Point size to 11
The project includes and files in one project.
Obtain the Xenium Output Bundles (Figure 1) for each sample.
Navigate the options to select 10x Genomics Xenium Output Bundle as the file format for input. Choose to import 10x Genomics Xenium for your project (Figure 2).
You will need to decompress the Xenium Output Bundle zip file before they are uploaded to the server. After decompression, you can drag and drop the entire folder into the Transfer files dialog, all individual files in the folder will be listed in the Transfer files dialog after drag & drop, with no folder structure (Figure 4). The folder structure will be restored after upload is completed.
The Xenium output bundle should be included for each sample (Figure 5). Each sample requires the whole sample folder or a folder containing these 6 files: cell_feature_matrix.h5, cells.csv.gz, cell_boundaries.csv.gz, nucleus_boundaries.csv.gz, transcripts.csv.gz, morphology_focus.ome.tif. Once added, the Cells and Features values will update. You can choose an annotation file during import that matches what was used to generate the feature count.
Do not limit cells with a total read count since Xenium data is targeted to less features.
Once the download completes, the sample table will appear in the Metadata tab, with one row per sample (Figure 6).
For this tutorial, we do not need to edit or change sample attributes.
We will compare the classification (FASN expression) we previously made based on expression levels of the FASN gene. Here, we will compare FASN high and FASN low cells to identify genes and pathways.
Select the Normalized counts node and choose Compute biomarkers from the Statistics drop-down
Choose the "FASN expression" attribute
Do not select Split by sample
Click Finish
This results in a Biomarkers report.
Double-click the Biomarkers results node to open the report
The top features are reported for the comparison.
Click your username in the top right corner
Select Settings from the drop-down
Choose Lists from the Components drop-down in the menu on the left
Use the + New list button to add these 10 genes
Choose Text as the list option
Give the list a Name and Description
Enter the 10 genes in column format as shown below
Click Add list
The list has been added and can now be used for further analysis. The Actions button can be used to modify this list if necessary, as shown below.
Go to the Analyses tab
Select the Normalized counts node
Choose Gene set enrichment from the Biological interpretation drop-down in the task menu
Use the KEGG database for pathway enrichment
Check Specify background gene list
Select "Top 10 FASN high Features" as the Background gene list
Click Finish
This results in a Pathway enrichment report, as shown below.
Double-click the report to view the pathways involved in this list of genes
In this tutorial, we demonstrate how to:
The tutorial is based on the work published by , on isocitrate dehydrogenase-mutant gliomas. Single cells from tumor biopsies were processed by flow cytometry and the libraries were prepared by protocol. The tutorial data set consists of eight expression matrix files, one per patient sample. The tumors were categorized as either astrocytoma or oligodendroglioma glioma subtype by histology. The matrix files contain gene expression values normalized by the following transformation log2[(TPM/10)+1].
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The tutorial data set is available through Partek Flow.
Click your avatar (Figure 1)
Click Settings
On the System information page, the Download tutorial data section includes pre-loaded data sets used by Partek Flow tutorials (Figure 2).
Click Single cell glioma (multi-sample)
The tutorial data set will be downloaded onto your Partek Flow server and a new project, Glioma (multi-sample), will be created. You will be directed to the Data tab of the new project. Because this is a tutorial project, there is no need to click on Import data, as the import is handled automatically (Figure 3).
You can wait a few minutes for the download to complete, or check the download progress by selecting Queue then View queued tasks... to view the Queue (Figure 4).
Once the download completes, the sample table will appear in the Data tab, with one row per sample (Figure 5).
For this tutorial, we do not need to edit or change any sample attributes.
With samples imported and annotated, we can begin analysis.
Click Analyses to switch to the Analyses tab
For now, the Analyses tab has only a single node, Single cell counts. As you perform the analysis, additional nodes representing tasks and new data will be created, forming a visual representation of your analysis pipeline.
Click on the Single cell counts node
A context-sensitive menu will appear on the right-hand side of the pipeline (Figure 9). This menu includes tasks that can be performed on the selected counts data node.
An important step in analyzing single cell RNA-Seq data is to filter out low-quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few counts to be analyzed.
Expand the QA/QC section of the task menu
Click on Single cell QA/QC (Figure 6)
A task node, Single cell QA/QC, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Single cell QA/QC task node to indicate that the task is running.
Click the Single cell QA/QC node once it finishes running
Click Task report on the task menu (Figure 7)
The Single cell QA/QC report opens in a new data viewer session. There are interactive violin plots showing the most commonly used quality metrics for each cell from all samples combined (Figure 8). For this data set, there are two relevant plots: the total count per cell and the number of detected genes per cell. Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric. Typically, there is a third plot showing the percentage of mitochondrial counts per cell, but mitochondrial transcripts were not included in the data set by the study authors, so this plot is not informative for this data set.
Remove the % mitochondrial counts and the extra text box in the bottom right by clicking Remove plot in the top right corner of each plot (Figure 8).
The plots are highly customizable and can be used to explore the quality of cells in different samples.
Click on Single cell counts in the Get Data icon on the left (Figure 9)
Click and drag the Sample name attribute onto the Counts plot and drop it onto the X-axis
Repeat this for the Detected genes plot
The cells are now separated into different samples along the x-axis (Figure 10)
Hold Control and left-click to select both plots
Open the Style icon on the left under Configure
Under Color, use the slider to reduce the Opacity
Open the Axis icon on the left
Adjust the X-rotation on the plots to 90
Note how both plots were modified at the same time.
Cells can be selected by setting thresholds using the Select & Filter tool. Here, we will select cells based on the total count
Open Select & Filter under Tools on the left
Under Criteria, Click Pin histogram to see the distribution of counts
Set the Counts thresholds to 8000 and 20500
Selected cells will be in blue and deselected cells will be dimmed (Figure 11).
Because this data set was already filtered by the study authors to include only high-quality cells, this count filter is sufficient.
Click Apply observation filter
Click the Single cell counts data node in the pipeline preview (Figure 12)
Click Select
A new task, Filter counts, is added to the Analyses tab. This task produces a new Filter counts data node (Figure 13).
Click on the Glioma (multi-sample) project name at the top to go back to the Analyses tab
Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab
Most tasks can be queued up on data nodes that have not yet been generated, so you can wait for filtering step to complete, or proceed to the next section.
A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes. Because there is no gold standard for what makes a gene informative or not, ideal gene filtering criteria depends on your experimental design and research question. Thus, Partek Flow has a wide variety of flexible filtering options.
Click the Filter counts node produced by the Filter counts task
Click Filtering in the task menu
Click Filter features (Figure 14)
There are four categories of filter available - noise reduction, statistics based, feature metadata, and feature list (Figure 15).
The noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The statistics based filter is useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The feature list filter allows you to filter your data set to include or exclude particular genes.
We will use a noise reduction filter to exclude genes that are not expressed by any cell in the data set but were included in the matrix file.
Click the Noise reduction filter checkbox
Set the Noise reduction filter to Exclude features where value <= 0 in 99% of cells using the drop-down menus and text boxes (Figure 16)
Click Finish to apply the filter
This produces a Filtered counts data node. This will be the starting point for the next stage of analysis - identifying cell types in the data using the interactive t-SNE plot.
We are omitting normalization in this tutorial because the data has already been normalized.
The tutorial data set is taken from a published study and has already been normalized using TPM (Transcripts per million), which normalizes for the length of feature and total reads, and transformed as log2(TPM/10+1). This normalization and transformation scheme can be performed in Partek Flow, along with other commonly used RNA-Seq data normalization methods.
Partek Flow software is designed specifically for the analysis needs of large genomic data. It has a simple-to-use graphical interface, context sensitive menus, powerful statistics, and interactive visualizations to easily find biological meaning in genomic data.
The information found in this manual will help you to get the most of your Partek Flow software license.
Only use this method if encounter issues using function on Flow homepage, contact support@partek.com to obtain private key.
The following instructions detail the use of SFTP (Secured File Transfer Protocol) to transfer data to and from your Partek Flow instance. SFTP offers significant performance and security enhancements over FTP for file transfers. It also enables the use of robust file syncing utilities, e.g. RSYNC, and is compatible with common file transfer programs such as FileZilla and WinSCP.
To transfer files with SFTP, you will need to have your Partek Flow:
Server Name. Example: myname.partek.com
Username. Example: flowloginname
Private authentication key
This information should have been e-mailed to you from the Partek licensing team. If you lose this information, contact Partek support and we will resend your authentication key to you.
WinSCP is an open source, free SFTP client for Windows. Its main function is file transfer between a local and a remote computer.
To download WinSCP, visit WinSCP's official site: https://winscp.net/eng/download.php On the WinSCP page you may need to scroll a bit down, to reach the green button Download WinSCP.
Download and install WinSCP on your local computer and then launch the program.
On the Login page click on the New Site icon.
Type in the Host Name, which is the same as the web address that you use to access your instance of Partek Flow
The web address for your instance of Partek Flow has been sent to you by Partek's Licensing team. In this example, the web address is ilukic5i.partek.com.
Type in the User name, that has also been sent to you (and is the same user name that you use to log on to Partek Flow). In this example, the web address is lukici.
To proceed click on the Advanced... button
Then select the Authentication in the SSH section of the Advanced Site Settings dialog
Select the ... button (under Private key file) to browse for the id_rsa file.
The file has been sent to you by Partek's licensing team attached to the same email that gave you your URL and username.
If you do not see it in the Select private key file browser, switch to All Files (.)
Click Open
WinSCP will ask you to confirm file format conversion
Click OK.
WinSCP will create a file in .ppk format.
Click Save to save the converted key file, id_rsa.ppk, to a secure location on your local computer.
Click OK again to confirm the change.
Your private key has been saved in .ppk format and added to WinSCP
Click OK to proceed
Click Save to save the new WinSCP settings.
This will open the Save session as site dialog. You can accept the default name (in this example lukici@ilukic5i.partek.com) or add a custom name. The name that you specify here will appear in the left panel of the Login dialog.
Once you have made your edits, click OK.
On the Login page, select your newly created site (in this example: lukici@ilukic5i.partek.com) and click the Login button.
The first time you connect, a warning message will appear, asking you whether you want to connect to an unknown server.
Click Yes to proceed.
The progress towards establishing a connection will be displayed in a dialog. This process is automatic and you do not need to do anything.
The WinSCP interface includes is split into two panels. The panel on the left shows the directory structure of your local computer and the panel on the right shows the directory structure of your Partek Flow file server.
To transfer a file, just drag and drop the file from one panel to the other. The progress of your transfer will be shown on the screen.
FileZilla is a graphical file transfer tool that runs on Windows, OSX, and Linux. It is great when needing to do bulk transfers as all transfers are added to a queue and processed in the background. It is possible to browse your files on the Partek Flow server while transfers are active. This is also the best solution when you are not on a computer with command line access or you are uncomfortable with command line operations.
We recommend downloading the FileZilla install packages from us. They are also available from download aggregator sites (e.g. CNET, download.com, sourceforge) but these sites have been known to bundle adware and other unwanted software products into the downloads they provide, so avoid them.
Mac OSX: http://packages.partek.com/bin/filezilla/fz-osx.app.tar.bz2
Windows 32-bit: http://packages.partek.com/bin/filezilla/fz-win32.exe
Windows 64-bit: http://packages.partek.com/bin/filezilla/fz-win64.exe
Linux (Please use your distribution's package manager to install Filezilla):
Ubuntu:
RedHat, see the following guide: http://juventusitprofessional.blogspot.com/2013/09/linux-install-filezilla-on-centos-or.html
OpenSuse, see: https://software.opensuse.org/package/filezilla
After starting FileZilla, click on the Site Manager icon located at the top left corner of the FileZilla window.
Click on the New Site button on the left of the popup dialog.
Type in a name for the connection. Example: “Partek SFTP”.
The connection details to the right need to be changed to reflect the information you received via email. The default settings will NOT work.
Set Host: to your partek server name
Leave Port: blank
Change Protocol: to SFTP - SSH File Transfer Protocol
Change User: to your Partek Flow login name
Change Logon Type: to Key File and select the key file received via email.
When selecting your key file, change the file selection from its default of PPK files to All files. Otherwise you key file will not be visible in the file browser.
After selecting your key file, click the Connect button.
Click the checkbox to always trust this host and click OK. Once connected, you can begin to browse and transfer files. The files and folders to the left are on your computer, the ones on the right are on the Flow server.
You will receive a file called id_rsa via email. Download this file, note where you downloaded it to, then use ssh-add to import the key. If you logout or reboot your computer, you will need to re-run the commands below. After key import, you will not be asked a password when transferring files to your Partek Flow server.
RSYNC is useful when resuming a failed transfer. Instead of re-uploading or downloading what has already been transferred, RSYNC will copy only what it needs.
The command below will sync the folder "local_folder" with the "remote_folder" on Partek's servers. To transfer in the other direction, reverse the last two parameters.
With rsync, don't forget the trailing '/' on directory names.
Before moving the files, we strongly advise you to use FileZilla to explore the directory structure of the Partek server and then create a new directory to transfer the files to.
When you delete files from the Partek Flow server they are gone and can not be recovered.
Please use Partek Flow to delete projects and results. Manually removing data using SFTP could break your server.
Wait until ALL input data for a particular project has been transferred to the Partek Flow server before importing data via Partek Flow. If you try to import samples while the upload is occurring the import job will crash.
When upload raw data to Partek hosted Flow server, we recommend to a create subfolder for each experiment at the same level of "FlowData" folder or inside "FlowData" folder.
Click the button
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Click in the CD45RA_TotalSeqB row
Click in the top right corner of the table to open a volcano plot
The plot can be configured using various tools on the left. For example, the Style icon can be used to change the appearance of the points. The X and Y-axes can be changed in the Axes icon. The Statistics icon can be used to set different Fold-change and P-value thresholds for coloring up/down-regulated genes. The in plot controls can be used to transpose the volcano plot (Figure 14).
Click to create a new data node including only these significantly different genes
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
On the first 2D scatter plot (with protein markers), click in the top right corner
Click in the top right of the plot to switch back to pointer mode
On the second 2D scatter plot (with mRNA markers), click in the top right corner
Click in the top right corner of both 2D scatter plots, to remove them from the canvas
Click in the top right corner of the 3D UMAP plot
Click in the Select & Filter tool to include the selected points
Click in the top right of the plot to switch back to pointer mode
Add the Biomarkers table using the Table option in the New plot menu, you can drag and reposition the table using the button in the top left corner of the plot .
If you need to create more space on the canvas, hide the panel words on the left using the arrow .
In Select & Filter, click to remove the CD3_TotalSeqB filtering rule
Click in Select & Filter to exclude the cluster 6/Tfh cells
Click in Select & Filter to exclude the cluster 4/Cytotoxic cells
Click in the top right corner of the UMAP plot
Click in Select & Filter to include the selected points
Click in Select & Filter to exclude the selected points
Click in the top right corner of the UMAP plot
Optionally, you may wish to save this data viewer session if you need to go back and reclassify cells later. To save the session, click the icon on the left and name the session.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Note that QA/QC has not been performed in this example, to visualize all spots (points) on the tissue image. Single cell QA/QC can be performed from the Single cell counts node with the filtered cells applied to the Single cell counts before the Filter features task. .
Low-quality cells can be filtered out during the spatial data analysis using QA/QC and will not be viewed on the tissue image. . We will not perform Single cell QA/QC in this tutorial; this task would be invoked from the Single cell counts node and the Filter features task discussed below would be invoked from this output node (Filtered counts).
Classify the cells using to determine cell types.
Select cell_type from the drop-down and click the green Add button
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Double click the opaque rectangle task to open and Apply the observation filter to the "Filtered to only gene expression" results node. This results in a "Filtered cells node".
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you have not already, and choose to Transfer files to the server.
The unaligned reads must be preprocessed before proceeding with the analysis steps covered here: .
Proceed with analysis from the Single cell counts node.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Click on the homepage, under settings, or during import
The sample table is pre-populated with sample attributes, # Cells. Sample attributes can be added and edited manually by clicking Manage in the Sample attributes menu on the left. If a new attribute is added, click Assign values to assign samples to different groups. Alternatively, you can use the Assign values from a file option to assign sample attributes using a tab-delimited text file. For more information about sample attributes, see .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If starting with unprocessed , the Annotate Visium image task will create a new result node, Annotated counts.
Click the blue circle node to the right of the Color by drop-down
To color by the which we previously determined in this tutorial, use the Color by drop-down and select Cell Type. Cell type is a blue categorical attribute while green attributes are numerical.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Click on the homepage, under settings, or during import.
Click the blue + Add sample button then use the green Add sample button to add each sample's Xenium output bundle folder. If you have not already transferred the folder to the server, this can be done using Transfer files to the server (Figure 3).
Once uploaded the folder to the server, navigate to the appropriate folder for each sample using Add sample (Figure 5).
The sample table is pre-populated with one sample attributes: # Cells. Sample attributes can be added and edited manually by clicking Manage in the Sample attributes menu on the left. If a new attribute is added, click Assign values to assign samples to different groups. Alternatively, you can use the Assign values from a file option to assign sample attributes using a tab-delimited text file. For more information about sample attributes, see . Cell attributes are found under Sample attributes and can be added by .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Download this table with more than 10 features using the Download option
We will create a list using with these 10 genes, so that we can use this list in the Gene set enrichment task.
Here we are going to perform on our top 10 features for the FASN high group that we have added as a list called "Top 10 FASN high Features".
Please click for more information on Biological interpretation.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The sample table is pre-populated with two sample attributes: # Cells and Subtype. Sample attributes can be added and edited manually by clicking Manage in the Sample attributes menu on the left. If a new attribute is added, click Assign values to assign samples to different groups. Alternatively, you can use the Assign values from a file option to assign sample attributes using a tab-delimited text file. For more information about sample attributes, see .
Click under Filter to include the selected cells
For more information on normalizing data in Partek Flow, please see the section of the user manual.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Figure 1. Download button on WinSCP's official web site. Note: the version may change from the time of writing of this document
Figure 2. Adding NewSite on WinSCP's Login page
Figure 3. Adding Host name and User name information. Use the host and user name that has been sent to you by Partek's licensing team
Figure 4. Adding the id_rsa file to WinSCP. Use the Advanced Site Settings tab and select Authentication
Figure 5. Showing all files in the Select private key file dialog
Figure 6. Converting key file format
Figure 7. Private key in .ppk format added to WinSCP
Figure 8. Customising the name for the new site on the Login dialog. In this example, the name is lukici@ilukic5i.partek.com
Figure 9. The first time you connect to your Partek server, WinSCP will present a warning message. Click Yes to connect to the server
Figure 10. Progress of the connection to your Partek server will be displayed on the screen
Figure 11. WinSCP screen after connection divides into two panels: the files one the left are on the local computer, while the files on the right are on Partek server
Figure 12. Progress of file transfer is shown on screen
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Differential expression analysis can be used to compare cell types. Here, we will compare glioma and oligodendrocyte cells to identify genes differentially regulated in glioma cells from the oligodendroglioma subtype. Glioma cells in oligodendroglioma are thought to originate from oligodendrocytes, thus directly comparing the two cell types will identify genes that distinguish them.
To analyze only the oligodendroglioma subtype, we can filter the samples.
Click the Filtered counts data node
Expand Filtering in the task menu
Click Filter cells (Figure 1)
The filter lets us include or exclude samples based on sample ID and attribute.
Set the filter to Include samples where Subtype is Oligodendroglioma
Click AND
Set the second filter to exclude Cell type (multi-sample) is Microglia
Click Finish to apply the filter (Figure 2)
A Filtered counts data node will be created with only cells that are from oligodendroglioma samples (Figure 3).
Click the new Filtered counts data node
Click Statistics > Differential analysis in the task menu
Click GSA
The configuration options (Figure 4) includes sample and cell-level attributes. Here, we want to compare different cell types so we will include Cell type (multi-sample).
Click Cell type (multi-sample)
Click Next
Next, we will set up a comparison between glioma and oligodendrocyte cells.
Click Glioma in the top panel
Click Oligodendrocytes in the bottom panel
Click Add comparison (Figure 5)
This will set up fold calculations with glioma as the numerator and oligodendrocytes as the denominator.
Click Finish to run the GSA
A green GSA data node will be generated containing the results of the GSA.
Double-click the green GSA data node to open the GSA report
Because of the large number of cells and large differences between cell types, the p-values and FDR step up values are very low for highly significant genes. We can use the volcano plot to preview the effect of applying different significance thresholds.
Click to view the Volcano plot
Open the Style icon on the left, change Size point size to 6
Open the Axes icon on the left and change the Y-axis to FDR step up (Glioma vs Oligodendrocytes)
Open the Statistics icon and change the Significance of X threshold to -10 and 10 and the Y threshold to 0.001
Open the Select & Filter icon, set the Fold change thresholds to -10 and 10
In Select & Filter, click to remove the P-value (Glioma vs Oligodendrocytes) selection rule. From the drop-down list, add FDR step up (Glioma vs Oligodendrocytes) as a selection rule and set the maximum to 0.001
Note these changes in the icon settings and volcano plot below (Figure 6).
We can now recreate these conditions in the GSA report filter.
Click GSA report tab in your web browser to return to the GSA report
Click FDR step up
Set the FDR step up filter to Less than or equal to 0.001
Press Enter
Click Fold change
Set the Fold change filter to From -10 to 10
Press Enter
The filter should include 291 genes.
Click to apply the filter and generate a Filtered Feature list node
To visualize the results, we can generate a hierarchical clustering heatmap.
Click the Filtered feature list produced by the Differential analysis filter task
Click Exploratory analysis in the task menu
Click Hierarchical clustering/heatmap
Using the hierarchical clustering options we can choose to include only cells from certain samples. We can also choose the order of cells on the heatmap instead of clustering. Here, we will include only glioma cells and order the samples by sample name (Figure 7).
Make sure Cluster is unchecked for Cell order
Click Filter cells under Filtering and set the filter to include Cell type (multi-sample) is Glioma
Choose Sample name from the Cell order drop-down menu in the Assign order section
Click Finish
Double click the green Hierarchical clustering node to open the heatmap
The heatmap differences may be hard to distinguish at first; the range from red to blue with a white midpoint is set very wide because of a few outlier cells. We can adjust the range to make more subtle differences visible. We can also adjust the color.
Set the Range toggle Min to -1.5
Set the Range toggle Max to 1.5
The heatmap now shows clear patterns of red and blue.
Click Axis titles and deselect the Row labels and Column labels of the panel to hide sample and feature names, respectively.
Select Sample name from the Annotations drop-down menu
Cells are now labeled with their sample name. Interestingly, samples show characteristic patterns of expression for these genes (Figure 8).
Click Glioma (multi-sample) to return to the Analyses tab.
We can use gene set enrichment to further characterize the differences between glioma and oligodendrocyte cells.
Click the Filtered feature list node
Click Biological interpretation in the task menu
Click Gene set enrichment
Change Database to Gene set database and click Finish to continue with the most recent gene set (Figure 9)
A Gene set enrichment node will be added to the pipeline .
Double-click the Gene set enrichment task node to open the task report
Top GO terms in the enrichment report include "ensheathment of neurons" and "axon ensheathment" (Figure 10), which corresponds well with the role of oligodendrocytes in creating the myelin sheath that supports and protect axons in the central nervous system.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
t-SNE (t-distributed stochastic neighbor embedding) is a visualization method commonly used to analyze single-cell RNA-Seq data. Each cell is shown as a point on the plot and each cell is positioned so that it is close to cells with similar overall gene expression. When working with multiple samples, a t-SNE plot can be drawn for each sample or all samples can be combined into a single plot. Viewing samples individually is the default in Partek Flow because sample to sample variation and outlier samples can obscure cell type differences if all samples are plotted together. However, as you will see in this tutorial, in some data sets, cell type differences can be visualized even when samples are combined.
Using the t-SNE plot, cells can be classified based on clustering results and differences in expression of key marker genes.
Prior to performing t-SNE, it is a good idea to reduce the dimensionality of the data using principal components analysis (PCA).
Click the Filtered counts data node after the Filter features task
Select PCA from the Exploratory analysis section of the task menu (Figure 1)
Click Finish to run PCA with default settings (Figure 2)
Note, the default settings include the Split by sample checkbox being selected. This means that the dimensionality reduction will be performed on each sample separately.
PCA task and data nodes will be generated.
Click the PCA data node
Select t-SNE from the Exploratory analysis section of the task menu (Figure 3)
Click Finish from the t-SNE dialog to run t-SNE with the default settings (Figure 4)
Because the upstream PCA task was performed separately for each sample, the t-SNE task will also be performed separately for each sample. t-SNE task and data nodes will be generated (Figure 5).
Once the t-SNE task has completed, we can view the t-SNE plots
Click the t-SNE node
Click Task report from the task menu or double click the t-SNE node
The t-SNE will open in a new data viewer session. The t-SNE plot for the first sample in the data set, MGH36 (Figure 6), will open on the canvas. Please note that the appearance of the t-SNE plot may differ each time it is drawn so your t-SNE plots may look different than those shown in this tutorial. However, the cell-to-cell relationships indicated will be the same.
The t-SNE plot is in 3D by default. To change the default, click your avatar in the top right > Settings > My Preferences and edit your graphics preferences and change the default scatter plot format from 3D to 2D.
You can rotate the 3D plot by left-clicking and dragging your mouse. You can zoom in and out using your mouse wheel. The 2D t-SNE is also calculated and you can switch between the 2D and 3D plots on the canvas. We will do this later on in the tutorial.
Each sample has its own plot. We can switch between samples.
Open the Axes icon on the left under Configure (Figure 7)
Navigate to Misc
Select the icon below the Sample name to go to the next sample
The t-SNE plot has switched to show the next sample, MGH42 (Figure 7).
The goal of this analysis is to compare malignant cells from two different glioma subtypes, astrocytoma and oligodendroglioma. To do this, we need to identify the malignant cells we want to include and which cells are the normal cells we want to exclude.
The t-SNE plot in Partek Flow offers several options for identifying, selecting, and classifying cells. In this tutorial, we will use the expression of known marker genes to identify cell types.
To visualize the expression of a marker gene, we can color cells on the t-SNE plot by their expression level.
Select any of the count data nodes from Get data on the left (Single cell counts, or any of the Filtered counts, Figure 8)
Search for the BCAN gene
Click and drag the BCAN gene onto the plot and drop it over the Green (feature) option
The cells will be colored from black to green based on their expression level of BCAN, with cells expressing higher levels more green (Figure 9). BCAN is highly expressed in glioma cells.
In Partek Flow, we can color cells by more than one gene. We will now add a second glioma marker gene, GPM6A.
Select any of the count data nodes from the Data card on the left (Single cell counts, or any of the Filtered counts)
Search for the GPM6A gene
Click and drag the GPM6A gene onto the plot and drop it over the Red (feature) option
Cells expressing GPM6A are now colored red and cells expressing BCAN are colored green. Cells expressing both genes are colored yellow, while cells expressing neither are colored black (Figure 10).
Numerical expression levels for each gene can be viewed for individual cells.
Switch to pointer mode by clicking in the top right corner of the plot
Select a cell by pointing and clicking
The expression level for that cell is displayed on the legend for each gene. Expression values can also be viewed by mousing over a cell (Figure 11).
Deselect the cell by clicking on any blank space on the plot
Now that cells are colored by the expression of two glioma cell markers, we can classify any cell that expresses these genes as glioma cells. Because t-SNE groups cells that are similar across the high-dimensional gene expression data, we will consider cells that form a group where the majority of cells express BCAN and/or GPM6A as the same cell type, even if they do not express either marker gene.
Switch to lasso mode by clicking in the top right of the plot
Draw the lasso around the cluster of green, red, and yellow cells and click the circle to close the lasso (Figure 12)
Selected cells are shown in bold and unselected cells are dimmed. The number of selected cells is indicated in the figure legend. The cells are plotted on the color scale depending on their relative expression levels of the two marker genes (Figure 13)
Click Classify selection in the Classify icon under Tools
A dialog to give the classification a name will appear.
Name the classification Glioma
Click Save (Figure 14)
Once cells have been classified, the classification is added to Classify. The number of cells belonging to the classification is listed. In MGH42, there are 460 glioma cells (Figure 15).
Classifications made on the t-SNE plot are retained as a draft as part of the data viewer session. In this tutorial, we will classify malignant cells for each sample before we save and apply the classifications, but if necessary, you can save the data viewer session by clicking the Save icon on the left to retain all of the formatting and draft classifications. The data viewer session will be stored under the Data viewer tab and can be re-opened to continue making classifications at a later time.
Switch to pointer mode by clicking in the top right corner of the plot
Deselect the cells by clicking on any blank space on the plot
Open Axes and navigate to Sample under Misc
Select the icon below the sample name to go to the next sample, MGH45
Rotate the 3D t-SNE plot to get a better view of cells from the green, red, and yellow cluster
Switch to lasso mode by selecting in the top right corner of the plot
Draw the lasso around the cluster of colored cells and click the circle to close the lasso (Figure 16)
Select Classify selection in the Classify icon
Type Glioma or select Glioma from the drop-down list (Figure 17)
Click Save
Repeat these steps for each of the 6 remaining samples. Remember to go back to the first sample (MGH36) to classify the glioma cells in that samples too.
There should be 5,322 glioma cells in total across all 8 samples.
The classification name can be edited or deleted (Figure 18).
With the malignant cells in every sample classified, it is time to save the classifications.
Click Apply classifications in the Classify icon
Name the classification attribute Cell type (sample level)
Click Run (Figure 19)
The new attribute is stored in the Data tab and is available to any node in the project.
Click on the Glioma (multi-sample) project name at the top to go back to the Analyses tab
Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab
For some data sets, cell types can be distinguished when all samples can be visualized together on one t-SNE plot. We will use a t-SNE plot of all samples to classify glioma, microglia, and oligodendrocyte cell types.
Click on the Glioma (multi-sample) project name at the top to go back to the Analyses tab
Click the Filtered counts data node after the Filter features task
Click PCA in the Exploratory analysis section of the task menu
Uncheck the Split by sample checkbox (Figure 22)
Click Finish
The PCA task will run as a new green layer.
Click the new PCA data node
Select t-SNE from the Exploratory analysis section of the task menu
Click Finish to run the t-SNE task with default settings
The t-SNE task will be added to the green layer (Figure 23). Layers are created in Partek Flow when the same task is run on the same data node.
Once the task has completed, we can view the plot.
Double-click the green t-SNE data node to open the t-SNE scatter plot
Click and drag the 2D scatter plot icon onto the canvas and replace the 3D scatter plot (Figure 24)
Search for and select green t-SNE data node (Figure 25)
In the Style icon, choose Sample name from the Color by drop-down list under Color
Viewing the 2D t-SNE plot, while most cells cluster by sample, there are a few clusters with cells from multiple samples (Figure 26).
Using marker genes, BCAN (glioma), CD14 (microglia), and MAG (oligodendrocytes), we can assess whether these multi-sample clusters belong to our known cell types.
Select any of the count data nodes from the Data card on the left (Single cell counts, or any of the Filtered counts)
Search for the BCAN gene
Click and drag the BCAN gene onto the plot and drop it over the Green (feature) option
Search for the CD14 gene
Click and drag the CD14 gene onto the plot and drop it over the Red (feature) option
Search for the MAG gene
Click and drag the MAG gene onto the plot and drop it over the Blue (feature) option
After coloring by these marker genes, three cell populations are clearly visible (Figure 27).
The red cells are CD14 positive, indicating that they are the microglia from every sample.
Switch to lasso mode by clicking the icon in the top right of the plot
Draw the lasso around the cluster of red cells and click the circle to close the lasso (Figure 28)
Open the Classify tool and click Classify selection
Name the classification Microglia
Click Save
The blue cells are MAG positive, indicating that they are the oligodendrocytes from every sample.
Switch to pointer mode by clicking in the top right corner of the plot
Deselect the cells by clicking on any blank space on the plot
Switch to lasso mode again by clicking the icon in the top right of the plot
Draw the lasso around the cluster of blue cells and click the circle to close the lasso
Open the Classify tool and click Classify selection
Name the classification Oligodendrocytes
Click Save
Finally, we will classify the BCAN expressing cells on the plot as glioma cells from every sample.
Switch to pointer mode by clicking in the top right corner of the plot
Deselect the cells by clicking on any blank space on the plot
Switch to lasso mode again by clicking the icon in the top right of the plot
Draw the lasso around the cluster of green cells and click the circle to close the lasso
Open the Classify tool and click Classify selection
Name the classification Glioma
Click Save
Switch to pointer mode by clicking in the top right corner of the plot
Deselect the cells by clicking on any blank space on the plot
The number of cells classified as microglia, oligodendrocytes, and glioma are shown in Classify (Figure 29)
Click Apply classifications in the Classify icon (Figure 30)
Name the classification attribute Cell type (multi-sample) (Figure 31)
Click Run
The new attribute is now available for downstream analysis.
Click on the Glioma (multi-sample) project name at the top to go back to the Analyses tab
Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
For your convenience, here is a video showing the below steps.
This guide illustrates how to process FASTQ files produced using the 10x Genomics Chromium Single Cell ATAC assay to obtain a Single cell counts data node, which is the starting point for analysis of single-cell ATAC experiments.
If you are new to Partek Flow, please see Getting Started with Your Partek Flow Hosted Trial for information about data transfer and import and Creating and Analyzing a Project for information about the Partek Flow user interface.
This tutorial uses a 10X 5k PBMC dataset if you would like to follow along exactly.
We recommend uploading your FASTQ files (fastq.gz) to a folder on your Partek Flow server before importing them into a project. Data files can be transferred into Flow from the Home page by clicking the Transfer file button (Figure 1). Following the instruction In Figure 1 to complete the data transfer. Users have the option to change the Upload directory by clicking the Browse button and either select another existing directory or create a new directory.
To create a new project, from the Home page click the New Project button; enter a project name and then click Create project. Once a new project has been created, click the Add data button in the Analyses tab.
To proceed, click the Add data button in the Analyses tab. In the Single cell > scATAC-Seq section select fastq and click Next. The file browser interface will open (Figure 3). Select the FASTQ files using the file browser interface and push the Finish button to complete the task. Paired end reads will be automatically detected and multiple lanes for the same sample will be automatically combined into a single sample. We encourage users to include all the FASTQ files including the index files although they are optional.
When the FASTQ files have finished importing, the Unaligned reads data node will appear in the Analyses tab.
To deal with the single cell ATAC-seq FASTQ data, Partek Flow has wrapped the 'cellranger-atac count' pipeline from Cell Ranger ATAC v2.0[1]. It takes FASTQ files and performs multiple analysis simultaneously including reads filtering and alignment, barcode counting, identification of transposase cut sites, peak and cell calling, and generates the count matrix.
To run Cell Ranger - ATAC task:
Click the Unaligned reads data node
Select Cell Ranger - ATAC in the 10x Genomics section in the task menu on the right
Select Single cell ATAC in Assay type for ATAC-Seq data only
Choose the proper Reference assembly for the data (you may have to create the reference)
Press the Finish button to run the task with default settings (Figure 4)
To learn more about how to run Cell Ranger - ATAC task in Flow, please refer to our online documentation.
The output of the count matrix then becomes the starting point for downstream analysis for scATAC-seq data in Flow (Figure 5).
An important step in analyzing single cell ATAC data is to filter out low quality cells. A few examples of low-quality cells are doublets, cells with a low TSS enrichment score, cells with a high proportion of reads mapping to the genomic blacklist regions, or cells with too few reads to be analyzed. Users are able to do this in Partek Flow using the Single cell QA/QC task.
Click on the Single cell counts node
Click on the QA/QC section in the task menu
Click on Single cell QA/QC
A task node, Single cell QA/QC, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Single cell QA/QC task node to indicate that the task is running (Figure 5).
Click the Single cell QA/QC node once it finishes running
Click Task report in the task menu
The Single cell QA/QC report includes interactive violin plots showing the value of every cell in the project on several quality measures (Figure 6).
There are five plots: Nucleosome signal, TSS enrichment, % reads in peaks, Blacklist ratio, and Peak region fragments. Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric. Cells can be filtered either by clicking and dragging to select a region on one of the plots or by setting thresholds using the filters below the plots. Here, we will apply a filter for the number of read counts. The plot will be shaded to reflect the filter. Cells that are excluded will be shown as black dots on both plots.
Descriptions of QC metrics:
Nucleosome signal: calculated per single cell, which quantifies the approximate ratio of mononucleosomal to nucleosome-free fragments. The histogram of DNA fragment sizes (determined from the paired-end sequencing reads) should exhibit a strong nucleosome banding pattern which corresponds to the length of DNA wrapped around a single nucleosome.
TSS enrichment: Transcriptional start site (TSS) enrichment score. The ENCODE project has defined an ATAC-seq targeting score based on the ratio of fragments centered at the TSS to fragments in TSS-flanking regions (see https://www.encodeproject.org/data-standards/terms/). Poor ATAC-seq experiments typically will have a low TSS enrichment score.
Peak region fragments: total number of fragments in peaks which is a measure of cellular sequencing depth/complexity. Cells with very few reads may need to be excluded due to low sequencing depth. Cells with extremely high levels may represent doublets, nuclei clumps, or other artifacts.
% reads in peaks: Represents the fraction of all fragments that fall within ATAC-seq peaks. Cells with low values (i.e. <15-20%) often represent low-quality cells or technical artifacts that should be removed. Note that this value can be sensitive to the set of peaks used.
Blacklist ratio: The ENCODE project has provided a list of blacklist regions, representing reads which are often associated with artifactual signals. Cells with a high proportion of reads mapping to these areas (compared to reads mapping to peaks) often represent technical artifacts and should be removed.
To filter out low quality cells (Figure 7),
Open the Select & Filter menu
Set the filters on nucleosome signal < 4; Peak region fragment 500-30000; leave the rest as they are
Click the filter icon and Apply observation filter to run the Filter cells task on the first Single cell ATAC counts data node, it generates a Filtered cells node
Another common task is to filter the data to include only informative features. Partek Flow has a wide variety of flexible filtering options.
Filter features task can be invoked from any counts or single cell data node. Noise Reduction and Statistics Based filters take each feature and perform the specified calculation across all the cells. The filter is applied to the values in the selected data node and the output is a filtered version of the input data node.
In the task dialog, click the check box to activate one or more of the filter types, configure the filter(s), and click Finish to run (Figure 8).
To understand the importance of enriched regions in regulating gene expression, Flow uses Annotate regions task to add information about overlapping or nearby genomic features. That gives regulatory context for enriched regions.
The input for Annotate peaks is a Peaks type data node.
Click the Filtered features data node
Click the Peak analysis section in the toolbox
Click Annotate regions
Set the Genomic overlaps parameter
The Genomics overlaps parameter lets you choose one of two options (Figure 9).
Report one gene region per peak (precedence applies) chooses one gene section for each peak using the precedence order to settle cases where more than one gene section overlaps a peak. The order of precedence is TSS, TTS, CDS Exon, 5' UTR Exon, 3' UTR Exon, Intron, Intergenic.
Report all gene regions per peak creates a row for each gene section that overlaps a peak in the task report.
Users are able to define the transcription start site (TSS) and transcription termination site (TTS) limit in the unit of bp.
Choose a gene/feature annotation from the drop-down menu
Click Finish to run
Latent semantic indexing (LSI) was first introduced for the analysis of scATAC-seq data by Cusanovich et al. 2018[2]. LSI combines steps of frequency-inverse document frequency (TF-IDF) normalization followed by singular value decomposition (SVD). Partek Flow wrapped Signac's TF-IDF normalization for single cell ATAC-seq dataset. It is a two-step normalization procedure that both normalizes across cells to correct for differences in cellular sequencing depth, and across peaks to give higher values to more rare peaks[3].
TF-IDF normalization in Flow can be invoked in Normalization and scaling section by clicking any single cell counts data node (Figure 10).
To run TF-IDF normalization,
Click a Single cell counts data node, in this case the Annotated regions node
Click the Normalization and scaling section in the toolbox
Click TF-IDF normalization
The output of TF-IDF normalization is a new data node that has been normalized by log(TF x IDF).
Singular value decomposition (SVD) will be applied to TF-IDF output in scATAC-Seq data. It returns a reduced dimension representation of a matrix. Although SVD and Principal components analysis (PCA) are two different techniques, the SVD has a close connection to PCA. Because PCA is simply an application of the SVD. For users who are more familiar with scRNA-Seq, you can think of SVD as analogous to the output of PCA. And similarly, the statistical interpretation of singular values is in the form of variance in the data explained by the various components.
To run SVD task,
Click a Normalized counts data node
Click the Exploratory analysis section in the toolbox
Click SVD
The GUI is simple and easy to understand. The SVD dialog is only asking to select the number of singular values to compute (Figure 11). By default 100 singular values will be computed if users don't want to compute all of them. However, the number could be adjusted manually or typed in directly. Simply click the Finish button if you want to run the task as default.
The task report for SVD is similar to PCA_._ Its output will be used for downstream analysis and visualization, including Harmony and WNN.
Graph-based clustering (Figure 12) identifies groups of similar cells using SVD values as the input. By including the informative SVDs, noise in the data set is excluded, improving the results of clustering.
Click the SVD output
Click Exploratory analysis in the task menu
Click Graph-based clustering
Check Compute biomarkers
Click Finish to run as default
A new Graph-based clusters data and a Biomarkers data node will be generated.
Double-click the Graph-based clusters node to see the cluster results and statistics (Figure 13)
Double-click the Biomarkers node to see the computed biomarkers if you have selected this option (Figure 14)
The Graph-based clustering result (Figure 13) lists the Total number of clusters and what proportion of cells fall into each cluster as well as Maximum modularity which is a measurement of the quality of the clustering result where optimal modularity is 1. The Biomarkers report (Figure 14) includes the top features for each graph-based cluster. It displays the top-10 genes that distinguish each cluster from the others. Download at the bottom right of the table can be used to view and save more features. These are calculated using an ANOVA test comparing the cells in each group to all the other cells, filtering to genes that are 1.5 fold upregulated, and sorting by ascending p-value. This ensures that the top-10 genes of each cluster are highly and disproportionately expressed in that cluster.
Similar to t-SNE, Uniform Manifold Approximation and Projection (UMAP) is a dimensional reduction technique. UMAP aims to preserve the essential high-dimensional structure and present it in a low-dimensional representation. UMAP is particularly useful for visually identifying groups of similar samples or cells in large high-dimensional data sets.
To run UMAP (Figure 15):
Click the SVD data node
Click the Exploratory analysis section of the toolbox
Click UMAP
Click Finish to run with default settings
UMAP produces a UMAP task node. Opening the task report launches a scatter plot showing the UMAP results. Each point on the plot is a cell for single cell data. The plot will open in 2D or 3D depending on the user preference.
The Annotate regions task in Flow labels individual peaks as promoters for a particular gene if the peak falls 1000 bases upstream from a gene's transcription start site, or 100 bases downstream from a gene's transcription start site by default (Figure 9). A promoter sum for a given gene is the number of cut sites per cell that fall within all the peaks labeled as promoters (-1000bp ~ 100bp by default or user defined through Annotate regions) for that gene. Higher promoter sum values indicate higher chromatin accessibility in the promoter region [4].
Flow task Promoter sum matrix summarizes each promoter sum and outputs a cell x gene matrix. In the matrix, only genes that have peaks within its promoter region have been included. In Flow Promoter sum matrix can be invoked in the Peak analysis section by clicking the Annotated regions data node (Figure 16).
To run Promoter sum matrix in Flow,
Click the Annotated regions data node
Click the Peak analysis section in the toolbox
Click Promoter sum matrix
Once the task has been finished, a new data node will be produced where the promoter sum value for each feature can be used to color UMAP/t-SNE and to determine cell type with raw data. We recommend users normalize its output prior to color the UMAP just like the scRNA-seq data.
Double-clicking the UMAP task node will open the task report in the Data Viewer.
To classify a cell, just select it then click Classify selection in the Classify tool.
For example, we can classify a cluster of cells expressing high levels of MS4A1 as B cells.
Make sure the right data source has been selected. For scATAC-seq data, it shall be the normalized counts of promoter sum values in most cases (Figure 17)
Set Color by in the Style configuration to the normalized counts node
Type MS4A1 in the search box and select it. Rotate the 3D plot if you need to see this cluster more clearly.
Click to activate Lasso mode
Draw a lasso around the cluster of MS4A1-expressing cells
Click Classify selection under Tools in the left panel
Type B cells for the Name
Click Save (Figure 18)
Repeat the above steps to finish the other cell type classifications. To be able to use the classifications in downstream tasks and visualizations, you must first apply them.
Click Apply classifications
Name the classification (e.g. Cell type)
Check the Compute biomarkers if needed
Click Run to complete the task
Once the classifications have been added to the project, one can color the UMAP/t-SNE plot by the Classification or compare the differentially expressed genes between different cell types.
To identify genes that distinguish a cell type, one can use the differential analysis tools in Partek Flow.
Click the TF-IDF normalized counts data node
Click the Differential analysis section in the toolbox
Click Hurdle model
Select the factors and interactions to include in the statistical test (Figure 19). Cell type has been selected here as an example.
Click Next
Define comparisons between factor or interaction levels (Figure 20)
Click Add comparison to add the comparison to the Comparisons table.
Click Finish to run the statistical test as default
Hurdle model produces a Feature list task node. The results table and options are the same as the GSA task report except the last two columns. The percentage of cells where the feature is detected (value is above the background threshold) in different groups (Pct(group1), Pct(group2)) are calculated and included in the Hurdle model report.
A filtered Feature list data node can be produced by running the Differential analysis filter in the Hurdle model task report (Figure 21) .
Once we have filtered a list of differentially expressed genes, we can visualize these genes by generating a heatmap, or perform the Gene set enrichment analysis and motif detection.
For information about automating steps in this analysis workflow, please see our documentation page on Making a Pipeline.
Cusanovich, D., Reddington, J., Garfield, D. et al. The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 555, 538–542 (2018). https://doi.org/10.1038/nature25981
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Flow supports the import of filtered gene-barcode matrices generated by 10x Genomics' Cell Ranger pipeline.
Below is a video summarizing the import of these files:
To import the matrices into Partek Flow, create a new project and click Add data then select Import scRNA count feature-barcode-mtx under Single cell > scRNA-Seq.
Samples can be added using the Add sample button. Each sample should be given a name and three files should be uploaded per sample using the Browse button.
If you have not already, transfer the files to the server to be accessed when you click Browse. Follow the directions here to add files to the server. Make sure the files are decompressed before they are uploaded to the server.
By default, the Cell Ranger pipeline output will have a folder called filtered_gene_bc_matrices (Figure 3). It is helpful to rename and organize the files prior to transfer using the File browser.
There are folders nested within the matrix folder, typically representing the reference genome it was aligned to. Navigate to the lowest subfolder, this should contain three files:
barcodes.tsv
genes.tsv
matrix.mtx
Select all 3 files for import into Partek Flow
Specify the annotation file used when running the pipeline for additional information such as mitochondrial counts (Figure 4). Other information can also be specified, such as the count value format. All features can be reported or features with non-zero values across all samples can be reported and the read count threshold can be modified to make the import more efficient.
Click Finish when you have completed configuration. This will queue the import task.
The Cell Ranger pipeline can also generate the same filtered gene barcode matrix in h5 format. This gives you the ability to select just one file per matrix and select multiple matrices to import in batch. To import an h5 matrix, select the Import scRNA full count matrix or h5 option (Figure 1). Browse for the files and modify any configuration options. Remember the files need to be transferred to the server.
This feature is also useful for importing multiple samples in batch. Simply put all h5 files from your experiment on a single folder, navigate to the folder and select all the matrices you would like to import.
Configure all the relevant sample metadata, including sample name and the annotation that was used to generate the matrices, and click Finish when completed. Note that all matrices must have been generated using the same reference genome and annotation to be imported into the same project.
Raw output data generated by the 10x Genomics' Xenium Onboard Analysis pipeline consists of decoded transcript counts and morphology images. The raw output and other standard output files derived from them are compiled into a zipped file called Xenium Output Bundle.
To import the Xenium Output Bundle into Partek Flow, create a new project and click Add data, then select Import 10x Genomics Xenium under Single cell > Spatial, click Next.
Samples can be added using the Add sample button. Each sample should be given a name and a folder containing the required 6 files: cell_feature_matrix.h5, cells.csv.gz, cell_boundaries.csv.gz, nucleus_boundaries.csv.gz, transcripts.csv.gz, morphology_focus.ome.tif should be uploaded per sample using the Browse button. The required 6 files should be all included in the Xenium Output Bundle folder.
If you have not already, transfer the files to the server to be accessed when you click Browse. Follow the directions here to add files to the server. You will need to decompress the Xenium Output Bundle zip file before they are uploaded to the server. After decompression, you can drag and drop the entire folder into the Transfer files dialog, all individual files in the folder will be listed in the Transfer files dialog after drag & drop, with no folder structure. The folder structure will be restored after upload is completed.
Once you have uploaded the folder into the server, you can continue to select the folder for each sample from Browse. Once the folder is selected, the Cells and Features values will auto-populate. You can choose an annotation file that matches what was used to generate the feature count. Then, click Finish to start importing the data into your project.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Flow can import a wide variety of data types including raw counts, matrices, microarray, variant call files as well as unaligned and aligned NGS data.
The following file types are valid and will be recognized by the Partek Flow file browser.
bam
bcf
bcl
bgx
bpm
cbcl
CEL
csv
fa
fasta
fastq
fcs
fna
fq
gz
h5ad
h5 matrix
idat
loom
mtx
probe_tab
qual
raw
rds
sam
sff
sra
tar
tsv
txt
vcf
zip
In cases where paired end fastq data is present, files will also be automatically recognized and their paired relationship will be maintained throughout the analysis.
Matching on paired end files is based on file names: every character in both file names must match, except for the section that determines whether a file is the first or the second file. For instance, if the first file contains "_R1", "_1", "_F3", "_F5" in the file name, the second file must contain something in the lines with the following: "_R2", "_2", "_F5", "_F5-P2", "_F5-BC", "_R3", "_R5" etc. The identifying section must be separated from the rest of the filename with underscores or dots. If two conflicting identifiers are present then the file is treated as single end. For example, s_1_1 matches s_1_2, as described above. However, s_2_1 does not mate with s_1_2 and the files will be treated as two single-end files.
Apart from paired-end data, files with conventional filename suffixes that indicate that they belong to the same sample are consolidated. These suffixes include:
Adapter sequences
"bbbbbb" followed by "" or at the end of the file name, where each "b" is "A", "C", "G", or "T"
Lane numbers
"L###" followed by "" or at the end of the file name, where each "#" is a digit 0 to 9
Dates
in the form "####-##-##" preceded or followed by a period or underscore
Set number
of the form "_###" from the end
The file browser is used to transfer files to the server so that these files can be added to a project for analysis. If you are importing a Bioproject from GEO/ENA or using URLs for data import, there is no need to transfer the files to the server.
To access the file browser and upload data to the server, use any of these options:
access Transfer files on the Partek® Flow® homepage
within a project, after selecting the file type to transfer, using the transfer files link available within all file import options
from the settings, go to Access management > Transfer files
Using the file browser to transfer files to the server:
Click Transfer files to access the file browser
Drag and drop or click My Device to add files from your machine
Click Browse to modify the Upload directory or create a new folder. The Upload directory should be specified, known, and distinguishable for project file management. You will return to this directory and access the files to import them into a project
To continue to add more files use + Add more in the top right corner. To cancel the process select Cancel in the top left corner
Click Upload to complete the file upload
Do not exit the browser tab or let the computer go to sleep or shut down until the transfer has completed
File size displayed in the table is binary format, not decimal format (e.g. GB displayed in the table is gigibyte not gigabyte. 1 gigibyte is 1,073,741,824 bytes. 1 gigabyte is 1,000,000,000 bytes. 1 gigibyte is 1.074 gigabytes).
There are projects with more than one file type, such as single cell multi-omic assays that generate protein and RNA together. In these cases, files need to be associated with each other if starting in fastq format. If we start with processed data, there is no need to associate these files.
Define the type of data the file represents when importing files into the project.
If this step is skipped, the data type can be changed after import by right clicking the data node.
After importing both types of data, associate fastqs with the already imported data. (e.g. associate RNA fastqs with ATAC fastqs already imported into Partek Flow.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Flow supports .bcl files based on 10x Genomics library preparation. The following document will guide you through the steps.
To start the import, create a new project and then select Import Data > Import bcl files. The Import bcl dialog will come up (Figure 1).
Use the Data directory option to point to the location of the directory holding the data. It is located at the top level of the run directory and is typically labeled Data. Please see the tool tip for more info.
Use the Run info file option to point to the RunInfo.xml file. It is located at the top level of the run directory.
Use the Sample sheet file to point to the sample sheet file, which is usually a .csv file. Partek Flow can accept 10X Genomics' "simple" and Illumina Experiment Manager (IEM) sample sheet format, which utilize 10X Genomics' sample index set codes. Each index set code corresponds to a mixture of four sample index sequences per sample. Alternatively, Partek Flow will also accept a sample sheet file that has been correctly formatted using the sample sheet generator provided by 10X Genomics.
The click on the Configure link and make the following changes (Figure 2).
Min trimmed read length: 8
Mask short adapter reads: 8
Use bases mask: see below
Create fastq for index reads: OFF
Ignore missing bcls: ON
Ignore missing filter: ON
Ignore missing positions: ON
Ignore missing controls: ON
For the Use bases mask option, the read structure for Chromium Single cell 3' v2 prep kit is typically Y26,I8,Y98. The settings for Chromium Single cell 3' v3/v3.1 is typically Y28,I8,Y91. Please check the read structure detailed in the RunInfo.xml file and adjust the values to match your data.
Click Apply to accept and then Finish to import your files.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Left single clicking on any task (the rectangles) in the analysis pipeline will cause a Task Actions section to appear in the pop-up menu. This allows users to:
Rerun tasks: rerun the selected task, the task dialog will pop-up and users can change parameters of the task. Previous downstream analysis of the selected task will not be rerun.
Rerun with downstream tasks: rerun the selected task, the task dialog will pop-up, users can change the parameters of the current task and the downstream analysis will be rerun with the same configuration as the previous one.
Edit description: the description of the task can be replaced by manually typing in string.
Change color: choose a color to apply only on the selected task by clicking on Apply. Click Apply to downstream to change the selected task and the downstream pipeline color to the newly selected color.
Delete task: this option is only available if the user is the owner of the project or the owner of the task. When a task is deleted, all downstream tasks, including tasks from other users, will be deleted. Users may check the box to choose to delete the task's output files. If delete output files is not checked, the task will be removed from the pipeline, but the output files of the task will remain on the disk.
Restart task: this option is only available on failed tasks and requires an admin role to perform, but does not require that you have a user account. Since you are logged in as an admin, restarting a task will not take up a concurrent seat and the disk space consumed by the output files will count towards the original owner of the task's storage space.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The ERCC (External RNA Control Consortium) developed a set of RNA standards for quality control in microarray, qPCR, and sequencing applications. These RNA standards are spiked-in RNA with known concentrations and composition (i.e. sequence length and GC content). They can be used to evaluate sensitivity and accuracy of RNA-seq data.
The ERCC analysis is performed on unaligned data, if the ERCC RNA standards have been added to the samples. There are 92 ERCC spiked-in sequences with different concentrations and different compositions. The idea is that the raw data will be aligned (with Bowtie) to the known ERCC-RNA sequences to get the count of each ERCC sequence. This information is available within Partek Flow and will be used to plot the correlation between the observed counts and the expected concentration. If there is a high correlation between the observed counts versus the expected concentration, you can be confident that the quantified RNA-seq data are reliable. Partek Flow supports Mix1 and Mix 2 ERCC formulations. Both formulations use the same ERCC sequences, but each sequence is present at different expected concentrations. If both Mix 1 and Mix 2 formulations have been used, ExFold comparison can be performed to compare the observed and expected Mix1:Mix2 ratio for each spike-in.
To start ERCC assessment, select an unaligned reads node and choose ERCC in the context sensitive menu. If all samples in the project have used the Mix 1 or Mix 2 formulation, choose the appropriate radio button at the top (Figure 1).
If some samples have been treated with the Mix 1 formulation and others have been treated with the Mix 2 formulation, choose the ExFold comparison radio button (Figure 2). Set up the pairwise comparisons by choosing the Mix 1 and Mix 2 samples that you wish to compare from the drop-down lists, followed by the green plus ( ) icon. The selected pair of samples will be added to the table below.
You can change the Bowtie parameters by clicking Configure before the alignment (Figure 1), although the default parameters work fine for most data. Once the task has been set up correctly, select Finish.
ERCC task report starts with a table (Figure 3), which summarizes the result on the project level. The table shows which samples use the Mix 1 or Mix 2 formulation. The total number of alignments to the ERCC controls are also shown, which is further divided into the total number of alignments to the forward strand and the reverse strand. The summary table also gives the percentage of ERCC controls that contain alignment counts (i.e. are present). Generally, the fraction of present controls should be as high as possible, however, there are certain ERCC controls that may not contain alignment counts due to their low concentration; that information is useful for evaluation of the sensitivity of the RNA-seq experiment. The coefficient of determination (R squared) of the present ERCC controls is listed in the next column. As a rule of a thumb, you should expect a good correlation between the observed alignment counts and the actual concentration, or else the RNA-seq quantification results may not be accurate. Finally, the last two columns give estimates of bias with regards to sequence length and GC content, by giving the correlation of the alignment counts with the sequence length and the GC content, respectively.
If ExFold comparison was enabled, an extra table will be produced in the ERCC task report (Figure 4). Each row in the table is a pairwise comparison. This table lists the percentage of ERCC controls present in the Mix 1 and Mix 2 samples and the R squared for the observed vs expected Mix1:Mix2 ratios.
The ERCC spike-ins plot (Figure 5) shows the regression lines between the actual spike-in concentration (x-axis, given in log2 space) and the observed alignment counts (y-axis, given in log2 space), for all the samples in the project. The samples are depicted as lines, and the probes with the highest and lowest concentration are highlighted as dots. The regression line for a particular sample can be turned off by simply clicking on the sample name in the legend beneath the plot.
Optionally, you can invoke a principal components analysis plot (View PCA), which is based on RPKM-normalised counts, using the ERCC sequences as the annotation file (not shown).
For more details, go to the sample-level report (Figure 6) by selecting a sample name on the summary table. First, you will get a comprehensive scatter plot of observed alignment counts (y-axis, in log2 space) vs. the actual spike-in concentration (x-axis, in log2 space). Each dot on the plot represents an ERCC sequence, coloured based on GC content and sized by sequence length (plot controls are on the right).
The table (Figure 7) lists individual controls, with their actual concentration, alignment counts, sequence length, and % GC content. The table can be downloaded to the local computer by selecting the Download link.
For more details on ExFold comparisons, select a comparison name in the ExFold summary table (Figure 8). First, you will get a comprehensive scatter plot of observed Mix1:Mix2 ratios (y-axis, in log2 space) vs. the expected Mix1:Mix2 ratio (x-axis, in log2 space). Each dot on the plot represents an ERCC sequence, coloured based on GC content and sized by sequence length (plot controls are on the right).
The table (Figure 9) lists individual controls, with each samples' alignment counts, together with the observed and expected Mix1:Mix2 ratios. The table can be downloaded to the local computer by selecting the Download link.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Select Single cell, choose the assay type (scRNA-Seq, Spatial transcriptomics, scATAC-Seq, V(D)J, Flow/Mass cytometry), and select the data format (Figure 1). Use the Next button to proceed with import.
Partek Flow supports single cell data analysis in count matrix text format using the Full count matrix data format (Figure 2). Each matrix text file is assumed to represent on sample, each value in the matrix represents expression value of a feature (e.g. a gene, or a transcript) in a cell. The expression value can be raw count, or normalized count. The requirement of the format of each text file should be the same as count matrix data.
Specify text file location, only one text file (in other words one sample) can be imported at once, preview of the file will be displayed, configuration of the file format is the same as Import count matrix data. In addition, you need to specify the details about this file.
Click Finish, the sample will be imported, on the data tab, number of cells in the sample will be displayed.
To import multiple samples, repeat the above steps by clicking Import data on the Metadata tab or within the task menu (toolbox) on the Analyses tab. Make the same previous selections using the cascading menu.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Primary sequencing output of an Illumina sequencer are per-cycle base call (bcl) files, which first need to be converted to fastq format, so that the data can be pushed to downstream applications. Partek Flow software comes with a conversion tool that can be used to import data in the bcl file format . In addition to the file conversion, this tool also demultiplexes the bcl files in the same step and outputs demultiplexed fastq files as the result.
We recommend you start by transferring the entire Illumina run folder to the Partek Flow server. To start a new project with bcl files, first select bcl under the Other import tab (Figure 1)
The resulting window shows the configuration dialog (Figure 2).
The bcl files hold the base calls and are in the Data directory within the whole Illumina run folder. Note that the Data directory file path needs to point to the directory, not to an individual bcl file.
The RunInfo.xml file is generated by the primary analysis software and contains information on the run, flow cell, instrument, time stamp, and the read structure (number of reads, number of cycles per read, whether a read is an index read). This file is typically stored at the top level in the Illumina run folder.
The SampleSheet.csv file provides the information on the relationship between the samples and indices specified during library creation. Although it has four sections, two sections (Settings and Data) are important for the data import and conversion. For more information on the files, consult Illumina documentation.
Selecting the Configure option under the Advanced options section enables a granular control of the import (Figure 3).
The Select tiles option (--tiles) enables the user to process only a subset of tiles available in the flow cell. The input for this option is a comma-separated list of regular expressions.
Min trimmed read length (--minimum-trimmed-read-length) specifies the minimum read length after adapter removal.
Mask short adapter reads (--mask-short-adapter-reads) applies when a read is trimmed below the length specified by Min trimmed read length. If the number of bases after adapter removal is less than Min trimmed read length, it forces the read length to be equal to Min trimmed read length by replacing the adapter bases that fall below the specified length by Ns. If the number of remaining bases falls below Mask short adapter sequences, then it replaces all the bases in a read with Ns.
Adapter stringency (--adapter-stringency) specifies the minimum match rate that triggers the masking or trimming of adapters. The rate is calculated as MatchCount / (MatchCount + MismatchCount). Only the reads exceeding the specified rate of sequence identity with adapters are trimmed.
Barcode mismatches (--barcode-mismatches) controls the number of allowed mismatches per index sequence.
Use bases mask (--use-bases-mask) defines a custom read structure that may be different to the structure specified in the RunInfo.xml file. The input for this option is a comma-separated list where Y and I are followed by a number indicating how many sequencing cycles to include in the fastq file. For example, if the option is set to Y26,I8,Y98, 26 cycles (26bp) will be used to generate the R1 sequence, 8 cycles (8bp) will be used for the sample index, and 98 cycles (98bp) will be used to generate the R2 sequence.
Do not split files by lane (--no-lane-splitting) prevents splitting of fastq files by lane, i.e. the converter will merge multiple lanes and generate one fastq file per sample.
Create fastq for index reads (--create-fastq-for-index-reads) creates an extra fastq file for each sample containing the sample index sequence for each read. This will be imported as an extra sample into the project.
Ignore missing bcls (--ignore-missing-bcl) will interpret missing base call files as N.
Ignore missing filter (--ignore-missing-filter) will ignore missing filter files and assume all clusters pass the filter.
Ignore missing positions (--ignore-missing-positions) will write new, unique coordinates into the header line if the cluster location files are missing.
Ignore missing controls (--ignore-missing-control) will interpret missing control files as missing not-set control bits.
Save undetermined fastq will take the reads that could not be assigned to a sample index and collect them into an Undetermined_S0.fastq file, which will be imported as a new sample.
The result of the import is an Unaligned reads data node, containing demultiplexed fastq files.
For more information about the BCL to FASTQ conversion tool, including information on the proper folder structure and instructions for formatting the SampleSheet.csv file, please consult the bcl2fastq2 Conversion Software Guide.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Partek Flow Home page is the first page displayed upon login. It provides a quick overview of recent activities and provides access to several system options (Figure 1).
To access a project, click the blue project name. The projects can be sorted by the column title and searched by criteria using the search options on the right and the search bar above the project table. The Repository summary shows the total number of projects, samples, and cells in your Partek Flow server (Figure 1).
Figure 1. Partek Flow Home page
Shown at the top of the Home page, the New project button provides a quick link to create a new project in your Partek Flow server.
The Transfer files button is used to transfer data to the server.
The Optional columns button can be used to add additional column information to the project table.
The Search button will search for project names and descriptions that have been typed into the search bar.
Additional project details can also be opened and closed individually using the arrow to the left of the project name. To open additional project details for all projects at once, use the arrow to the left of the Project name column header.
The table listing all the projects can be sorted by clicking the sort icon to the right of the table headers. By default, the table is sorted by the date when the project was last modified.
Under the Actions column, three vertical dots will open the project actions options: Open in new tab , Export project , and Delete project (Figure 2).
Figure 2. Main icons of the Home page
The drop-down menu in the upper right corner of the Home page (Figure 3) displays options that are not related to a project or a task, but to the Partek Flow application as a whole. This links to the Settings and Profile and gives you the option to log out of your server.
Figure 3. Accessing the system options
The left-most icon will bring you back to the Home page with one click.
The next icon is the progress indicator, summarizing the current status of the Partek Flow server. If no tasks are being processed, the icon is grey and static and the idle message is shown upon mouse over. Clicking on this icon will direct you to the System resources under Settings (Figure 4).
Figure 4. Progress indicator showing no tasks in progress
If the server is running, the progress indicator will depict green bars animated on the server icon (Figure 5).
Figure 5. Progress indicator showing Partek Flow server is active
Selecting the Queue dropdown will list the number of running tasks launched by the user. Additional information about the queue including the estimated completion time as well the total number of queued tasks launched can be obtained by selecting View queued tasks (Figure 6).
Figure 6. Viewing queued tasks
To view all previous tasks, select the View recent activity link. Clicking this link loads the Activity log page (Figure 7). It displays all the tasks within the projects accessible to the user (either as the owner or as a collaborator), including the tasks launched by other users of the Partek Flow instance.
Figure 7. Activity log pages shows all the tasks within the projects accessible to the user
The Display radio buttons enable filtering of the log. All activity will show all the tasks (irrespective of the task owner, i.e. the user starting the task), while My activity lists only the tasks started by the current user. In contrast to the latter, Collaborator’s activity displays the tasks that are not owned by the current user (but to which the user has access as a collaborator).
The Activity log page also contains a search function that can help find a particular task (Figure 8). Search can be performed through the entire log (All columns), or narrowed down to one of the columns (using the drop-down list).
Figure 8. A search term entered in the search box
The Home page lists the most recent projects that have been performed on the server. By default, the table contains all the projects owned by the current user or ones where the user is a collaborator. The list entries are links that automatically load the selected project.
The Search box can be used to find specific projects based on project titles and descriptions. You can also Search by individual or multiple criteria based on projects Owners, Members, Cell count, Sample count, Cell type, Organ, and Organism (Figure 9). Cell type, Organ, Organism, cell count, and sample count are sourced from the metadata tab. Owner and members are sourced from the project settings tab.
Figure 9. Filtering the projects
The criteria used for the search are listed above the table along with the number of projects containing the criteria.
The project table displays optional columns as the project name, owner, your role, the members that have access to the project, date last modified, size, number of samples, and number of cells. The drop down contains a thumbnail, short description, samples contained within the project and queue status (Figure 10).
The project settings tab within each project can be used to modify details such as the project name, thumbnail, description, species, members, and owner.
Figure 10. The project table details
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
If a project is publicly available in the Gene Expression Omnibus (GEO) and European Nucleotide Archive (ENA) databases, you can import associated FASTQ files and sample attributes automatically into Partek Flow.
On the Homepage click New Project to create a project and give the project a name
Click Add data
Select fastq as the file type after choosing between Single cell or Bulk as the assay types
Click Next
Choose GEO / ENA
Enter the BioProject ID of the data set you would like to download. The format of a BioProject ID is PRJNA followed by one to six numbers (e.g. PRJNA381606)
A GEO ID can also be used in the format GSE followed by one to five numbers (e.g. GSE71578).
Click Finish
It may take a while for the download to complete depending on the size of the data. FASTQ files are downloaded from the ENA BioProject page.
FASTQ files will be added as an Unaligned reads data node in the Analyses tab
If the study is not publicly available in both GEO and ENA, project import will not succeed.
If there is an ENA project, but the FASTQ files are not available through ENA, the project will be created, but data will not be imported.
A variety of other issues and irregularities can cause imports to not succeed or partially succeed, including, but not limited to, a BioProject having multiple associated GSE IDs, incomplete information on the GEO or ENA page, and either the GEO or ENA project not being publicly available.
The Gene Expression Omnibus (GEO) and the European Nucleotide Archive (ENA) are web-accessible public repositories for genomic data and experiments. Access and learn more about their resources at their respective websites:
You can search ENA using the GEO ID (e.g., GSE71578) to check if there is a matching ENA project.
Open the Study result to view the BioProject ID (e.g., PRJNA381606) and a table with information about the samples and files included in the project
The Task Menu lists all the tasks that can be performed on a specific node. It can be invoked from either a Data or Task node and appears on the right hand side of the Analyses tab. It is context-sensitive, meaning that it will only present tasks that the user can perform on the selected node. For example, selecting an Aligned reads data node will not present aligners as options.
Clicking a Data node presents a variety of tasks:
Clicking a Task node gives you the option to view the Task results or perform Task actions such as rerunning the task (Figure 1).
Partek Flow contains a number of quality control tools and reports that can be used to evaluate the current status of your analysis and decide downstream steps. Quality control tools are organized under the Quality Assurance / Quality Control (QA/QC) section of the context-sensitive menu and are available for unaligned and aligned reads data nodes.
This section will illustrate:
In addition to the tools listed above, many other functionalities can also be interpreted in sense of quality control. For instance, principal components analysis, hierarchical clustering (on sample level), variant detection report, and quantification report.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Selecting a node with unaligned reads (either Unaligned reads or Trimmed reads) shows the QA/QC section in the context sensitive menu, with two options (Figure 1). To assess the quality of your raw reads, use Pre-alignment QA/QC.
Pre-alignment QA/QC setup dialog is given in Figure 2. Examine reads allows you to control the number of reads processed by the tool; All reads, or a subset (One of every n reads). The latter option is obviously not as thorough, but is much faster than All reads.
If selected, K-mer length creates a per-sample report with the position of the most frequent k-mers (i.e. sequences of k nucleotides) of the length specified in the dialog. The range of input values is from one to 10.
The last control refers to .fastq files. Partek® Flow® can automatically detect the quality encoding scheme (Auto detect) or you can use one of the options available in the drop-down list. However, the auto-detection is only applicable for Phred+33 and Phred+64 type of quality encoding score. For early version of Solexa quality encoding score, select Solexa+64 from the Quality encoding drop down list. For a paired-end data, the pre-alignment QA/QC will be done on each read in pair separately and the results will be shown separately as well.
Most sequencing applications now use the phred quality score. This score indicates the probability that the base was accurately identified. The table below shows the corresponding base call accuracies for each score:
The task report is organised in two tiers. The initial view shows project-level report with all the samples. An overview table is at the top, while matching plots are below.
Two project-level plots are Average base quality per position and Average base quality score per read (Figure 4). The latter plot presents the proportion of reads (y-axis) with certain average quality score (meaning all the base qualities within a read are averaged; x-axis). Mouse over a data point to get the matching readouts. The Save icon saves the plot in a .svg format to the local machine. Each line on the plot represents a data file and you can select the sample names from the legend to hide/un-hide individual lines.
A sample-level report begins with a header, which is a collection of typical quality metrics (Figure 5).
Below the header you will find four plots: Base composition, Average base quality score per position (same as above, but on the sample level), Distribution of base quality scores (the same as Average base quality score per read, but on the sample level), and Distribution of read lengths.
Distribution of read lengths shows a single column for fixed length data (e.g. Illumina sequencing). However, for quality-trimmed data or non-fixed length data (like Ion Torrent sequencing), expect to see a read’s length distribution (Figure 7).
If K-mer length option was turned on when setting up the task, an additional plot will be added to the sample-level report, i.e. K-mer Content (Figure 8). For each position, K-mer composition is given, but only the top six most frequent K-mers are reported; high frequency of a K-mer at a given site (enrichment) indicates a possible presence of sequencing adapters in the short reads.
The pre-alignment QA/QC report as described above is generally available for the NGS data of fastq format. For other types of data, the report may differ depending on the availability of information. For example, for fasta format, there is no base quality score information and therefore all the figures or graphs related to base or read quality score will be unavailable.
The Data summary report in Partek Flow provides an overview of all tasks performed as part of a pipeline. This is particularly useful for report writing, record keeping and revisiting projects after a long period of time.
This user guide will cover the following topics:
Click on an output data node under the Analyses tab of a project and choose Data summary report from the context sensitive menu on the right (Figure 1). The report will include details from all of the tasks upstream of the selected the node. If tasks have been performed downstream of the selected data node, they will not be included in the report.
The Data summary report can be saved in different formats via the web browser. The instructions below are for Google Chrome. If you are using a different browser, consult your browser's help for equivalent instructions.
On the Data summary report, expand all sections and show all task details. Right-click anywhere on the page and choose Print... from the menu (Figure 4) or use Ctrl+P (Command+P on Mac). In the print dialog, click Change… (Figure 5) and set the destination to Save as PDF. Select the Background graphics checkbox (optional), click the blue Save button (Figure 5) and choose a file location on your local machine.
The PDF can be attached to an email and/or opened in a PDF viewer of your choice.
On the Data summary report, right-click anywhere on the page and choose Save as… from the menu (Figure 6) or use Ctrl+S (Command+S on Mac). Choose a file location on your local machine and set the file type to Web Page, Complete.
The HTML file can be opened in a browser of your choice.
The short video clip below (with audio) shows a tutorial of looking at the Data Summary Report
Post-alignment QA/QC is available for data nodes containing aligned reads (Aligned reads) and has no special control dialog. Similar to the pre-alignment QA/QC report, the post-alignment contains two tiers, i.e. project-level report and sample-level report.
The project-level report starts with a summary table (Figure 1). Unlike pre-alignment QA/QC report, each row now corresponds to a sample (sample names are hyperlinks to sample-level report). Table allows for a quick comparison across all the samples within the project. Any outlying sample can, therefore, easily be spotted.
Note that the summary table reflects the underlying chemistry. While Figure 1 shows a summary table for single-end sequencing, an example table for paired-end sequencing is given in Figure 2. Common features are discussed first.
The first two columns contain total number of reads (Total reads) and total number of alignments (Total alignments). Theoretically, for single-end chemistry, total number of reads equals total number of alignments. For double-end reads, theoretical result is to have twice as many alignments as reads (the term “read” refers to the fragment being sequenced, and since each fragment is sequenced from two directions, one can expect to get two alignments per fragment). When counting the actual number of alignments (Total alignments), however, reads that align more than once (multimappers) are also taken into account. Next, the Aligned column contains the fraction of all the reads that were aligned to the reference assembly.
The Coverage column shows the fraction (%) of the reference assembly that was sequenced and the average sequencing coverage (×) of the covered regions is in the Avg. coverage depth column. The Avg. quality is mapping quality, as reported by the aligner (not all aligners support this metric). Avg. length is the average read length and average read quality is given in Avg. quality column. Finally, %GC is the fraction of G or C calls.
In addition, the Post-alignment QA/QC report for single-end reads (Figure 1) contains the Unique column. This refers to the fraction of uniquely aligned reads.
On the other hand, the Post-alignment QA/QC report for paired-end reads (Figure 2) contains these columns:
Unique singleton
fraction of alignments corresponding to the reads where only one of the paired reads can be uniquely aligned
Unique paired
fraction of alignments corresponding to the reads where both of the paired reads can be uniquely aligned
Non-unique singleton
fraction of singletons that align to multiple locations
Non-unique paired
fraction of paired reads that align to multiple locations
Note: for paired-end reads, if one end is aligned, the mate is not aligned, the alignment rate calculating will include the read as the numerator, also since the mate is not aligned, we will also include this read in the unaligned data node (if the generate unaligned reads data node option is selected) for 2nd stage alignment, this will generate discrepancy between total reads and "unaligned reads + total reads * alignment rate", because reads with only one mate aligned are counted twice.
In addition to the summary table, several graphs are plotted to give a comparison across multiple samples in the project. Those graphs are Alignment breakdown, Coverage, Genomic Coverage, Average base quality per position, Average base quality score per read, and Average alignments per read. Two of those (Average base quality plots) have already been described.
The alignment breakdown chart (Figure 3) presents each sample as a column, and has two vertical axes (i.e. Alignment percent and Total reads). The percentage of reads with different alignment outcomes (Unique paired, Unique singleton, Non-unique, Unaligned) is represented by the left-side y-axis and visualized by stacked columns. The total number of reads in each sample is given using the black line and shown on the right-side y-axis.
The Coverage plot (Figure 4) shows the Average read depth (in covered regions) for each sample using columns and can be red off the left-hand y-axis. Similarly, the Genomic coverage plot shows genome coverage in each sample, expressed as a fraction of the genome.
The last graph is Average alignments per read (Figure 5) and shows the average number of alignments for each read, with samples as columns. For single-end data, the expected average alignments per read is one, while for paired-end data, the expected average alignments per read is two.
Coverage report is also available for data nodes containing aligned reads (Aligned reads, Trimmed reads, or Filtered reads). The purpose of the report is to understand how well the genomic regions of interest are covered by sequencing reads for a particular analysis.
When setting up the task (Figure 1), you first need to specify the Genome build and then a Gene/feature annotation file, which defines the genomic regions you are interested in (e.g. exome or genes within a panel). The Gene/feature annotation can be previously associated with Partek® Flow® via or added on the fly.
Complete coverage report will contain percentage of bases within the specified genes / features with coverage greater than or equal to the coverage levels defined under Add minimum coverage levels. To add a level, click on the green plus . Alternatively, to remove it, click on the red cross icon.
As for the Advanced options, if Strand-specificity is turned on, only reads which match the strand of a given region will be considered for that region’s coverage statistics.
Generate target enrichment graphs will generate a graphical overview of coverage across each feature.
When Use multithreading is checked, the computation will utilize multiple CPUs. However, if the input or output data is on some file systems like GPFS file system, which doesn't support well on multi-thread tasks, unchecking this option will prevent task failures.
Quantification of on- and off-target reads is also displayed in the column chart below the table (Figure 3), showing each sample as a separate column and fraction of on-/off-target reads on the y-axis.
Region coverage summary hyperlink opens a new page, with a table showing average coverage for each region (rows), across the samples (columns) (Figure 4).
The Coverage summary (Figure 6) plot is an overview of coverage across of the targeted genomic features for all the samples in the project. Each line within the plot is a single sample, the horizontal axis is the normalized position within the genomic feature, represented as 1st to 100th percentile of the length of the feature, while the vertical axis show the average coverage (across all the features for a given sample).
If you need more details about a sample, click on the sample name in the Coverage report table (Figure 7). The columns are as follows:
Region name: the genomic feature identifier (as specified in the annotation file)
Chromosome: the chromosome of the genomic feature (or region)
Start: the start position of the genomic feature (1-based)
Stop: the stop position of the genomic feature (2-based, which means the stop position is exclusive)
Strand: the strand of the genomic feature
Total exon length: the length of the genomic feature
Reads: the total number of reads aligning to the genomic feature
% GC: the percentage of GC contents of those reads aligning to the genomic feature
% N: the percentage of ambiguous bases (N) of those reads aligning to the genomic feature
(n)x: the proportion of the genomic feature which is covered by at least n number of alignments. [Note: n is the coverage level that you specified when submitting Coverage report task, defaults are 1×, 20×, 100×]
Average coverage: the average sequencing depth across all bases in the genomic feature
Average quality: the average quality score across covered bases in the genomic feature
Validate variants is available for data nodes containing variants (Variants, Filtered variants, or Annotated Variants). The purpose of this task is to understand the performance of the variant calling pipeline by comparing variant calls from a sample within the project to known “gold standard” variant data that already exist for that sample. This “gold standard” data can encompass variants identified with high confidence by other experimental or computational approaches.
Setting up the task (Figure 1) involves identifying the Genome Build used for variant detection and the Sample to validate within the project. Target specific regions allow for specification of the Target regions for this study, relating to the regions sequenced for all samples in the project. Benchmark target regions represent the regions that have been previously interrogated to identify “gold standard” variant calls in the sample of interest. These parameters are important to ensure that only overlapping regions are compared, avoiding the identification of false positives or false negative variants in regions covered by only the project sample or the “gold standard” sample. Both sections utilize a Gene/feature annotation file, which can be previously associated with Partek Flow via or added on the fly. The Validated variants file is a single sample vcf file containing the “gold standard” variant calls for the sample of interest and can be previously associated with Partek Flow as a Variant Annotation Database via Library File Management or added on the fly.
The Validate variants results page contains statistics related to the comparison of variants in the project sample compared to the validated variant calls for the sample (Figure 2). The results are split into two sections, one based on metrics calculated from the comparison of SNVs and the other from the comparison of INDELs.
The following SNP-level metrics are contained within the report, comparing the sample in the project to the validated variant data:
No genotypes: the number of missing genotypes from the sample in the Flow project
Same as reference: the number of homozygous reference genotypes from the sample in the Flow project
True positives: the number of variant genotypes from the sample in the Flow project that match the validated variants file
False positives: the number of variant genotypes from the sample in the Flow project that are not found in the validated variants file
True negatives: the number of loci that do not have variant genotypes in the sample in the Flow project and the validated variants file
False negatives: the number of genotypes that do not have variant genotypes in the sample in the Flow project but do have variant genotypes in the validated variants file
Sensitivity: the proportion of variant genotypes in the validated variants file that are correctly identified in the sample in the Flow project (true positive rate)
Specificity: the proportion of non-variant loci in the validated variants file that are non-variant in the sample in the Flow project (true negative rate)
Precision: the number of true positive calls divided by the number of all variant genotypes called in the sample in the Flow project (positive predictive value),
F-measure: a measure of the accuracy of the calling in the Flow pipeline relative to the validated variants. It considers both the precision and the recall of the test to compute the score. The best value at 1 (perfect precision and recall) and worst at 0.
Matthews correlation: a measure of the quality of classification, taking into account true and false positives and negatives. The Matthews correlation is a correlation coefficient between the observed and predicted classifications, ranging from −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates completely wrong prediction.
Transitions: variant allele interchanges of purines or pyrimidines in the sample in the Flow project relative to the reference
Transversions: variant allele interchanges of purines to/from pyrimidines in the sample in the Flow project relative to the reference
Ti/Tv ratio: ratio of transition to transversions in the sample in the Flow project
Heterozygous/Homozygous ratio: the ratio of heterzygous and homozygous genotypes in the sample in the Flow project
Percentage of sites with depth < 5: the percentage of variant genotypes in the sample in the Flow project that have fewer than 5 supporting reads
Depth, 5th percentile: 5% of sequencing depth found across all variant genotypes in the sample in the Flow project
Depth, 50th percentile: 50% of sequencing depth found across all variant genotypes in the sample in the Flow project
Depth, 95th percentile: 95% of sequencing depth found across all variant genotypes in the sample in the Flow project
The INDEL-level metrics columns contained within the report are identical, with the exception of a lack of information with regards to transitions and transversion.
The Single-cell QA/QC task in Partek Flow enables you to visualize several useful metrics that will help you include only high-quality cells. To invoke the Single-cell QA/QC task:
Click a Single cell counts data node
Click the QA/QC section of the task menu
Click Single cell QA/QC
By default, all samples are used to perform QA/QC. You can choose to split the sample and perform QA/QC separately for each sample.
If your Single cell counts data node has been annotated with a gene/transcript annotation, the task will run without a task configuration dialog. However, if you imported a single cell counts matrix without specifying a gene/transcript annotation file, you will be prompted to choose the genome assembly and annotation file by the Single cell QA/QC configuration dialog (Figure 1). Note, it is still possible to run the task without specifying an annotation file. If you choose not to specify an annotation file, the detection of mitochondrial counts will not be possible.
The Single cell QA/QC task report opens in a new data viewer session. Four dot and violin plots showing the value of every cell in the project are displayed on the canvas: counts per cell, detected features per cell, the percentage of mitochondrial counts per cell, and the percentage of ribosomal counts per cell (Figure 2).
If your cells do not express any mitochondrial genes or an appropriate annotation file was not specified, the plot for the percentage of mitochondrial counts per cell will be non-informative (Figure 3).
Mitochondrial genes are defined as genes located on a mitochondrial chromosome in the gene annotation file. The mitochondrial chromosome is identified in the gene annotation file by having "M" or "MT" in its chromosome name. If the gene annotation file does not follow this naming convention for the mitochondrial chromosome, Partek Flow will not be able to identify any mitochondrial genes. If your single cell RNA-Seq data was processed in another program and the count matrix was imported into Partek Flow, be sure that the annotation field that matches your feature IDs was chosen during import; Partek Flow will be unable to identify any mitochondrial genes if the gene symbols in the imported single cell data and the chosen gene/feature annotation do not match.
Total counts are calculated as the sum of the counts for all features in each cell from the input data node. The number of detected features is calculated as the number of features in each cell with greater than zero counts. The percentage of mitochondrial counts is calculated as the sum of counts for known mitochondrial genes divided by the sum of counts for all features and multiplied by 100. The percentage of ribosomal counts are calculated as the sum of counts for known ribosomal genes divided by the sum of counts for all features and multiplied by 100.
Each point on the plots is a cell. All cells from all samples are shown on the plots. The overlaid violins illustrate the distribution of cell values for the y-axis metric.
The appearance of a plot can be configured by selecting a plot and adjusting the Configure settings in the panel on the left (Figure 4). Here are some suggestions, but feel free to explore the other options available:
Open Axes and change the Y-axis scale to Logarithmic. This can be helpful to view the range of values better, although it is usually better to keep the Ribosomal counts plot in linear scale.
Open Style and reduce the Color Opacity using the slider. For data sets with very many cells, it may be helpful to decrease the dot opacity to better visualize the plot density.
Within Style switch on Summary Box & Whiskers. Inspecting the median, Q1, Q3, upper 90%, and lower 10% quantiles of the distributions can be helpful in deciding appropriate thresholds.
High-quality cells can be selected using Select & Filter, which is pre-loaded with the selection criteria, one for each quality metric (Figure 5).
Hovering the mouse over one of the selection criteria reveals a histogram showing you the frequency distribution of the respective quality metric. The minimum and maximum thresholds can be adjusted by clicking and dragging the sliders or by typing directly into the text boxes for each selection criteria (Figure 6).
Alternatively, Pin histogram to view all of the distributions at one time to determine thresholds with ease (Figure 7).
Adjusting the selection criteria will select and deselect cells in all three plots simultaneously. Depending on your settings, the deselected points will either be dimmed or gray. The filters are additive. Combining multiple filters will include the intersection of the three filters. The number of cells selected is shown in the figure legend of each plot (Figure 8).
Select the input data node for the filtering task and click Select (Figure 10).
A new data node, Filtered counts, will be generated under the Analyses tab (Figure 11).
Double click the Filtered counts data node to view the task report. The report includes a summary of the count distribution across all features for each sample; a detailed breakdown of the number of cells included in the filter for each sample; and the minimum and maximum values for each quality metric (expressed genes, total counts, etc) across the included cells for each sample (Figure 12).
The Feature distribution plot visualizes the distribution of features in a counts data node.
To run Feature distribution:
Click a counts data node
Click the QA/QC section of the toolbox
Click Feature distribution
A new task node is generated with the Feature distribution report.
The Feature distribution task report plots the distribution of all features (genes or transcripts) in the input data node with one feature per row (Figure 1). Features are ordered by average value in descending order.
The plot can be configured using the panel of the left-hand side of the page.
Using the filter, you can choose which features are shown in the task report.
The Manual filter lets you type a feature ID (such as a gene symbol) and filter to matching features by clicking + . You can add multiple feature IDs to filter to multiple features (Figure 2).
Distributions can be plotted as histograms, with the x-axis being the expression value and the y-axis the frequency, or as a strip plot, where the x-axis is the expression value and the position of each cell/sample is shown as a thin vertical line, or strip, on the plot (Figure 3).
To switch between plot types, use the Plot type radio buttons.
Mousing over a dot in the histogram plot gives the range of feature values that are being binned to generate the dot and the number of cells/samples for that bin in a pop-up (Figure 4).
Mousing over a strip shows the sample ID and feature value in a pop-up. If there are multiple cells/samples with the same value, only one strip will be visible for those cells/samples and the mouse-over will indicate how many cells/samples are represented by that one strip (Figure 5).
Clicking a strip will highlight that cell/sample in all of the plots on the page (Figure 6).
The grey dot in each strip plot shows the median value for that feature. To view the median value, mouse over the dot (Figure 7).
To navigate between pages, use the Previous and Next buttons or type the page number in the text field and click Enter on your keyboard.
The number of features that appear in the plot on each page is set by the Items per page drop-down menu (Figure 8). You can choose to show 10, 25, or 50 features per page.
When Plot type is set to Histogram, you can choose to configure the Y-axis scale using the Scale Y-axis radio buttons. Feature max sets each feature plot y-axis individually. Page max sets the same y-axis range for every feature plot on the page, with the range determined by the feature with the highest frequency value.
You can add attribute information to the plots using the Color by drop-down menu.
For histogram plots, the histograms will be split and colored by the levels of the selected attribute (Figure 9). You can choose any categorical attribute.
For strip plots, the sample/cell strips will be colored by the levels or values of the selected attribute (Figure 10). You can choose any categorical or numeric attribute.
The Partek Flow Uploader is a Torrent Browser plugin that lets users upload run results to Partek Flow for further analysis.
The clip above (video only, no audio) shows the Partek Flow Uploader plugin in action.
Download the Partek® Flow® Uploader from the links below:
This is a compressed zipped file. Do not unzip.
Installation of the Plugin
Installation only needs to be performed once per Torrent Browser. All users of the same instance of Torrent Browser will be able to use the plugin. For future versions of the plugin, the steps below can also be used for updating.
To install the plugin, first log into Torrent Browser (Figure 1).
Navigate to Plugins under dropdown menu in upper-right corner under the gear icon (Figure 2).
Click the Install or Upgrade Plugin button (Figure 3).
Verify that the Partek Flow Uploader is listed and that the Enabled checkbox is selected (Figure 5).
From the plugins table, click the (Manage) gear icon for the Partek Flow Uploader and select Configure in the drop-down menu (Figure 6).
Global Partek Flow configuration settings can be entered into the plugin. When set, it will serve as the default for all users of the Torrent Browser. If multiple Partek Flow users are expected to run the plugin, it is recommended to leave the username and password fields blank so that individual users can enter them as needed.
In the configuration dialog (Figure 7), enter the Partek Flow URL, your username and your password. Clicking on Check configuration would verify your credentials and indicate if a valid username and password has been entered. Click Save when done.
Click the Rescan Plugins for Changes button. Rescanning the plugins will finish the installation and save the configuration.
In the Torrent Browser, you can configure a Run Plan to include the Partek Flow Uploader. You can create a new Run Plan (from a Sample or a Template) or edit an existing Run Plan. In the example in Figure 8, the Partek Flow Uploader will be included in an existing Run Plan. From the Planned Runs page, click the gear icon in the last column, and choose Edit.
In the Edit Plan page, go to the Plugins tab (Figure 9) and select the checkbox next to the PartekFlowUploader.
Click the Configure hyperlink next to the PartekFlowUploader (Figure 10). If necessary, enter the Partek Flow URL, your username and your password. These are the same credentials you use to access Partek Flow directly on a web browser. Note that some fields may already be pre-populated depending on the global plugin configuration, you can edit the entries as needed. All fields are required to successfully run the plugin.
The Project Name field will be used in Partek Flow to create a new project where the run results will be exported. However, if a project with that name already exists, the samples will be added to that existing project. This enables you to combine multiple runs into one project. Project Names are limited to 30 characters. If not specified, the plugin will use the Run Name as the Project Name. Click the Check configuration button to see if you typed a valid username and password. When ready, click Save Changes to proceed.
Proceed with your Run Plan. The plugin will wait for the base calling to be finished before exporting the data to Partek Flow.
Once the Run Plan is executed, data will be automatically exported to the Partek Flow Server. In the Run Report, go to the Plugin Summary tab and the plugin status will be displayed. An example of a successful Plugin upload is shown in Figure 11.
To access the project, click on the Partek Flow hyperlink in the plugin results (Figure 11). You can also go directly to Partek Flow in a new browser window and access your account. In your Partek Flow homepage (Figure 12), you will now see the project created by the Partek Flow Uploader.
You can manually invoke the plugin from a completed run report. This allows you to export the data from the Torrent Server if you did not include the plugin in the original run plan. This also gives you the flexibility to export the same run results onto different project(s). Open the run report and scroll down to the bottom of the page (Figure 13). In the Plugin Summary tab, click the Select Plugins to Run button.
From the plugin list (Figure 14), select the PartekFlowUploader plugin.
Configure the Partek Flow Uploader. Enter the Partek Flow URL, your username and your password (Figure 15). These are the same credentials you use to access Partek Flow directly on a web browser. Although some fields may already be pre-populated depending on the global plugin configuration, you can edit the entries as needed. All fields are required to successfully run the plugin.
The Project Name field will be used in Partek Flow to create a new project where the run results will be exported. However, if a project with that name already exists, the samples will be added to that existing project. This enables you to combine multiple runs into one project. Project Names are limited to 30 characters. The default project name is the Run Name.
When ready, click Export to Partek Flow to proceed. If you wish to cancel, click on the X on the lower right of the dialog box.
Note that configuring the Plugin from a report (Figure 15) is very similar to configuring it as part of a Run Plan (Figure 10) with two notable differences:
The Check configuration has been replaced by Export to Partek Flow button, which when clicked, immediately proceeds to the export.
The Save changes button has been removed so any change in the configuration cannot be saved (compared to editing a run plan where plugin settings are saved)
Once the plugin starts running, it will indicate that it is Queued on the upper right corner of the Plugins Summary (Figure 16). There will also be a blue Stop button to cancel the operation.
Click the Refresh plugin status button to update. The plugin status will show Completed once the export is done and the data is available in Partek Flow (Figure 11).
The Partek Flow Uploader plugin sends the unaligned bam files to the Partek Flow server. For each file, a Sample of the same name will be created in the Data tab (Figure 17).
The data transferred by the Partek Flow Uploader is stored in a directory created for the Project within the user's default project output directory. For example, in Figure 17, the data for this project is stored in: /home/flow/FlowData/jsmith/Project_CHP Hotspot Panel.
The plugin transfers the Unaligned BAM data from the Torrent Browser. The UBAM file format retains all the information of the Ion Torrent Sequencer. In the Partek Flow Project, the Analyses tab would show a circular data node named Unaligned bam. Click on the data node and the context-sensitive task menu will appear on the right (Figure 18).
Unaligned BAM files are only compatible with the TMAP aligner, which can be selected in the Aligners section of the Task Menu. If you wish to use other aligners, you can convert the unaligned BAM files to FASTQ using the Convert to FASTQ task under Pre-alignment tools. Some information specific to Ion Torrent Data (such as Flow Order) are not retained in the FASTQ format. However, those are only relevant to Ion Torrent developed tools (such as the Torrent Variant Caller) and are not relevant to any other analysis tools.
Once converted, the reads can then be aligned using a variety of aligners compatible with FASTQ input (Figure 19). You can also perform other tasks such as Pre-alignment QAQC or run an existing pipeline. Another option is to include the Convert to FASTQ task in your pipeline and you can invoke the pipeline directly from an Unaligned bam data node.
GEO -
ENA -
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Phred Quality Score | Base Call Accuracy |
---|
The Pre-alignment QA/QC output table contains one input file per row, with typical metrics on columns (%GC: fraction of GC content; %N: fraction of no-calls) (Figure 3). The file names are hyperlinks, leading to the sample-level reports. To save the table as a txt file to a local computer, push the Download link. Table columns can be sorted using double arrows icon ( ).
Base composition plot specifies relative abundance of each base per position (Figure 6), with N standing for no-calls. By selecting individual bases on the legend, you can remove them from the plot / bring them back on. To zoom in, left-click & drag over a region of interest. To zoom out, use the Reset button ( ) to recreate the original view, or the magnifier glass ( ) to zoom out one level.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Each task will appear as a separate section on the Data summary report (Figure 2). The first section of the report (Sample data) will summarize the input samples information. Click the grey arrows ( / ) to expand and collapse each section. When expanded, the task name, user that performed the task, start date and time, duration and the output file size are displayed (Figure 2). To view or hide a table of task settings, click Show/hide details (Figure 3).
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Coverage report result page contains project-level overview and starts with a summary table, with one sample per row (Figure 2) The first few columns show the percentage of bases in the genomic features which are covered at the specified level (or higher) (default: 1×, 20×, 100×). Average coverage is defined as the sum of base calls of each base in the genomic features divided by the length of the genomic features. Similarly, Average quality is defined as the sum of average quality of those bases that cover the genomic features, divided by the length of covered genomic features. The last two columns show the number of On-tarted reads (overlapping the genomic features) and Off-target reads (not overlapping the features). The Optional columns link enables import of any meta-data present in the data table ().
The browser icon in the right-most column () of the Region average coverage summary table opens the Coverage graph for the respective region (Figure 5). The horizontal axis is the normalized position within the genomic feature, represented as 1st to 100th percentile of the length of the feature. The vertical axis is coverage. Each line on the plot is a single sample, and the samples are listed below the plot.
: the invokes the Coverage graph across the genomic feature, showing the current sample only (Figure 23) (or mouse over to get a preview of the plot)
: the invokes the Chromosome view and browses to the genomic location
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Ribosomal genes are defined as genes that code for proteins in the large and small ribosomal subunits. Ribosomal genes are identified by searching their gene symbol against a list of 89 L & S ribosomal genes taken from . The search is case-insensitive and includes all known gene name aliases from HGNC. Identifying ribosomal genes is performed independent of the gene annotation file specified.
To filter the high-quality cells, click the include selected cells icon in Filter in the top right of Select & Filter, and click Apply observation filter... (Figure 9).
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The List filter lets you filter to the features included in a feature list. To learn more about feature lists in Partek Flow, please see .
Click Select File (Figure 4) and use the file browser to select the zip file you downloaded from the . Click Upload and Install.
Reads that had no detectable barcodes have been combined in a sample with a prefix: nomatch_rawlib.basecaller.bam (Row 17 in Figure 17). You can removed this sample from your analysis by clicking the gear icon next to the sample name and choosing Delete sample.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
10 | 90% |
20 | 99% |
30 | 99.9% |
40 | 99.99% |