1 of 100

Partek

Overview

Partek software enables researchers to easily perform genomic data analysis without ever needing to write a single line of code or sacrificing statistical power or advanced functionality. From alignment to pathway analysis, Partek provides a seamless, integrated analysis solution on a single platform that provides the power of a cloud or cluster when needed, and the convenience of desktop software for less compute intensive tasks.

Here you will find documentation on how to use and administer our products.

Partek Flow

Frequently Asked Questions

General

How to create a project?

To create a project, you first need to transfer files to the Partek Flow server, and then import the files into your project using the import data wizard, here is the video and more information.

Can I change my user avatar?

Yes, navigate to My profile and click the "Change image" button. Do this by clicking your avatar at the top right corner of the interface, select Settings, then choose Profile.

How do I add and use my own lists?

Click your avatar in the top right corner of the Partek Flow interface, choose Settings in the menu, and select Lists from the left panel of the Components section. Lists can also be generated from result tables using the "Save as managed list" button. For more information please click here.

Can I repeat a task and everything downstream of it, while changing only one/a few parameters?

Yes, click on the rectangular task that you want to change the parameters. On the context-specific menu on the right, under Task actions, select ‘Rerun with downstream tasks’, this will bring you to the task set up page where you can edit the parameters for the task, then click Finish to run the task with the new parameters. The tasks downstream of it will be initiated automatically.

What can I use to identify cells that are actively expressing genes within a gene list?

Use AUCell to identify cells with active gene sets; this task calculates a value for each cell by ranking all genes by their expression level in the cell and identifying what proportion of the genes from the gene list fall within the top 5% (default cutoff) of genes. An alternative option is to use the Gene score for a feature list to select and filter populations based on the distribution; click here for more information.

Can I build and use pipelines for my analysis?

Yes, click on Import a pipeline on the bottom of the Analyses tab dashboard. This will help you import either our hosted pipelines or your own saved pipeline which can be found under Settings -> Components -> Pipelines. Click here for steps to save and run a pipeline. For more information related to navigating pipelines click here.

How do I classify cells?

Classification in Partek Flow can be performed manually or with automatic cell classification which is explained in more detail here. Users often want to classify cells by gene expression threshold(s), for details on classification by marker expression click here. Automatic classification needs to be performed on a non-normalized single cell data node; once complete, publish cell attributes to project then use this classification in visualizations and tasks. You may choose to perform Graph-based clustering and K-means clustering to help identify biomarkers that can then be used to identify the clusters and we also provide hosted lists for different cell types.

My server is full, how do I make more space?

We recommend cleaning up projects as well as removing library files that you do not need, then removing the orphaned files. You can also export analyzed projects and save them on an external machine, then when you need them again you can import them to the server. Please see this information for more details related to: Project management, Removing library files, and Orphaned files. Right click on the data node to delete files from projects that are not needed (e.g. fastqs from project pipelines that are analyzed); you will not be able to perform tasks from this node once the files are deleted.

How do I add library files if I am not studying human or mouse?

To add a new assembly, click on Settings -> Library files. From the Assembly drop-down list, select Add assembly and specify the species. If the species name is not in the list, choose Other and type in the name with the assembly version (multiple assembly versions can exist for one species, e.g. hg19 and hg38 for Homo Sapiens). You need to add the reference file which is a .fasta file containing sequence information. Once the reference file is added, you can build any aligner index to perform the alignment task.

The Annotation model is a file containing feature location. This file can be used to quantify to annotation model in RNA-Seq analysis, or annotate variant or peaks in a DNA-Seq or ATAC-Seq/ChIP-Seq data analysis pipeline. The file format should be .gtf/.gff/.bed.

We recommend looking for the species files on the Ensembl website. There is no need to unzip or save these files to your local machine, instead right click and copy the link address of the specific file (not a link to a folder). For more details, here is the documentation chapter: Library File Management.

Are Genome coordinates 1-based or 0-based?

Genome coordinates for annotation models stored in Partek Flow are 1-based, start-inclusive, and stop-exclusive. This means that the first base position starts from one, the start coordinate for a feature is included in the feature and the stop/end coordinate is not included in the feature. These are the genome coordinates that are printed in various task reports and output files when an annotation model is involved in the task. When custom annotation files are added to Partek Flow, the genome coordinates are converted into this format. The coordinates are converted back if necessary for a specific task. shows how the genome coordinates vary between different annotation formats.

Can I add transgenes to my reference files?

Yes, to add transgenes (including gfp or related) to the references files, first choose an assembly, create the transgene reference, and merge the references together (e.g. combine mm10 with dttomato). This is the same process for the annotation file.

How do I export data from the result nodes?

Left click to select the data node you want to export. In the bottom of the task menu there will be an option to Download data.

When working with paired data it should be the case that FPKM is available, and when working with single end data RPKM should be available. These metrics are essentially analogous, but based on the underlying method used for calculation (accounting for two reads mapping to 1 fragment and not counting twice for paired end data). Here is a simple description of the differences in calculation between RPKM and FPKM: http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained.

Why don't you have RPM in your normalization?

RPM (reads per million) is the same as Total Count. Please use Total Count.

What is a canonical transcript?

For genes with multiple transcripts, one of the transcripts is picked as the canonical transcript. Based on the UCSC definition from the table browser,

knownCanonical - identifies the canonical isoform of each cluster ID, or gene. Generally, this is the longest isoform.

we define the canonical transcript as either the longest CDS (coding DNA sequence) if the gene has translated transcripts, or the longest cDNA.

Why are there decimal values in the Partek E/M quantification output?

The Partek E/M quantification algorithm can give decimal values because of multi-mapping reads (the same read potentially aligning to multiple locations) and overlapping transcripts/genes (a read that maps to a location with multiple transcripts or genes at that location). In these scenarios, the read count will be split.

For example, if a read maps to two potential locations, then that read contributes 0.5 counts to the first location and 0.5 counts to the second location. Similarly, if a read maps to one location with two overlapping genes, then that read contributes 0.5 counts to the first gene and 0.5 counts to the second gene.

If you need to remove the decimal points for downstream analysis outside of Partek Flow, you can round the values to the nearest integer.

Why is the number of variants listed in the variant reports and summarize cohort mutations report different?

For variants with multiple alternative alleles, the variant has one row for all alternative alleles, while the summarize cohort mutations report lists each alternative allele on a separate rows. The number of variants listed at the top of the each report is calculated from the number of rows in the report.

Visualization

How do I order my heatmap by the cell types?

If you would like specific groups (e.g. cell types) in a certain order, do not perform Hierarchical clustering on these cells and instead choose to assign order, then use click and drag to reorder the groups. If you want to remove a group, you can choose to exclude this group in the filtering section. You can still perform Hierarchical clustering on the features if you would like to. Hierarchical clustering will force the heatmap to cluster and you would need to click the dendrogram nodes to switch the order. Click here for more information.

How do I display UMAP for each sample in the Data Viewer?

For a multi-sample project, all of the downstream tasks will be run separately if 'Split by sample' was checked when performing the PCA task. Visualization of different samples can be displayed by 'Sample' using the 'Misc' section in the Axes card. To show different samples side by side, one can click 'Duplicate plot' first, then use the 'Sample' option to switch the samples.

Can I visualize fold change values on a heatmap without using a z-score?

Yes, the default settings can be modified by clicking "Configure" in the Advanced settings during task set-up, then change the "feature scaling" option to "none" to plot the values without scaling. For more information related to to the heatmap click here.

Why don't I see Flip mode on the heatmap? Why can't I download all of the data after zooming?

The Flip mode and download all data options are disabled if there are more than 2.5 million values (rows x columns) in the heatmap.

How to label gene names on volcano plot?

By default, genes are selected if the p-value is <=0.05 and |fold change| >=2 and when the number of selected genes is less than 2000 genes, they will be labeled. You can click on Style button in Configure section, choose a gene annotation field from the Label by drop-down list to change the label. If you number of selected genes is select less than or equal to 100, Partek Flow will try to spread out labels as much as possible to clearly display the labels. If number of selected genes is more than 100, labels will be next to the selected genes, there will be overlaps where genes are close together. If there are more than 2000 genes selected, no label will be displayed.

If you click any blank space, you can turn off select and use different selection mode button on the vertical bar on the upper-right corner of the plot to manually select dots on the plot.

Statistics

Why do I get "?" for FDR p-values in my Deseq2 result?

When a feature (gene) has low expression, it will be filtered by automatic independent filtering. To avoid this, you can either filter features to exclude low expression features before Deseq2, or in the Deseq2 advanced options, choose apply independent filtering setting. Details about independent filtering can be found at the Deseq2 documentation.

Click here for troubleshooting other differential analysis models and "?" results

What is fold change?

Fold change indicates the extent of increase or decrease in feature expression in a comparison. In Partek Flow, fold change is in linear scale (even if the input data is in log scale). It is converted from ratio, which is the LSmean of group one divided by LSmean of group two in your comparison. When the ratio is greater than 1, fold change is identical to ratio; when the ratio is less than 1, fold change is -1/ratio. There is no fold change value between -1 to 1. When ratio/fold change is 1, that means there is no change between the two groups.

Log ratio option in Partek Flow is converted from ratio, this is a value comparable to log fold change in some other tools.

Can I label a Volcano plot with gene names?

Yes, go to Style in the Data Viewer and make sure Gene name is selected under "Labeling". Next, go to the in plot selection tools (right side of the graphic) and use any of the selection tools to select the cells that you would like to label. You can use ctrl or shift to select multiple populations at once. For more information on the Volcano plot click here.

In Volcano plot, what is inconclusive group mean?

By default, Flow is using the p value <= 0.05 and |fold change|>=2 as the significance cutoff. If genes meet both p value and fold change cutoff, they are significantly up or down regulated genes. If they only meet one criteria, they are called inconclusive. If genes won't pass either criteria, they are not significant. Click on the Statistics button in the Configure section in the left control panel, you can change the cutoff. Click on the Style button to change the color of significance categories.

What is the difference between FDR and FDR step up?

FDR is the expected proportion of false discoveries among all discoveries. FDR Step-up is a particular method to keep FDR under a given level, alpha, that was proposed in this paper. In Partek Flow, if one calls all of the features with p-values 0.02 or less, the FDR is less or equal to 0.41.

How to perform a paired t-Test in Flow

You should have at least the following two attributes in the Metadata, treatment (including two subgroups) and subject ID (to pair the two samples). When performing differential analysis, choose ANOVA and include both attributes into the ANOVA model, the two-way ANOVA is mathematically equivalent to paired t-Test.

Can I compare one attribute at a time versus all of the others combined?

Yes, you can use the Compute biomarkers task to compare one subgroup at a time to all of the others combined. An alternative option is to set up the differential analysis model in this way; for more information please see the information here for each model.

I downloaded gene counts from the output data node generated by the Quantify to annotation model task, why can't I find my genes of interest?

In the Quantifying to an annotation model dialog, by default, Partek Flow filters features based on the total count across all of the samples and features with a total count greater than 10 will be reported. If you want to report all of the genes in the annotation file, change the Filter features value to 0.

Biological Interpretation

What is the difference between GSEA and Gene Set Enrichment?

In Partek Flow, GSEA should be performed on a sample/cell and feature matrix data node (e.g. normalization count data). GSEA is used to detect a gene set/a pathway which is significantly different between two groups. Gene set enrichment should be performed on a filtered gene list; it is used to identify overrepresented gene set/pathway based the filtered gene list using Fisher's exact test. The input data is a filtered list using gene names.

What is the enrichment score shown in the Gene Set Enrichment report?

The enrichment score shown in the enrichment report is the negative natural log of the enrichment p-value derived from Fisher Exact test. The higher the enrichment score, the more overrepresented our list of genes in the gene set of a GO/pathway category.

In KEGG pathway, genes can be colored by Fold change and p-value etc, how are the gene statistics calculated?

For Gene set enrichment analysis, only genes from the input data node (filtered gene list) will be colored in the KEGG pathway gene network, using the statistics in the data node.

During GSEA (or Gene set ANOVA) computation, we also perform ANOVA on each gene based on the attributed selected independent from GESA computation (at gene set level). The results of ANOVA is only used to color the genes in the KEGG gene network. If GSEA is computed using another other database, e.g. GO, we don't compute ANOVA on each gene since GO databased doesn't have gene network information.

When should I use GSEA or Gene set ANOVA?

Both methods should be performed on a normalized matrix data node, and requires gene symbol in feature annotation. Both methods are detecting a differentially expressed Gene set (pathway) instead of each individual gene. The algorithms are different. GSEA is a popular method from the Broad institute. Gene Set ANOVA is based on generalized linear model, here are the details.

General

How to create a project?

To create a project, you first need to transfer files to the Partek Flow server, and then import the files into your project using the import data wizard, here is the video and more information.

Can I change my user avatar?

Yes, navigate to My profile and click the "Change image" button. Do this by clicking your avatar at the top right corner of the interface, select Settings, then choose Profile.

How do I add and use my own lists?

Can I repeat a task and everything downstream of it, while changing only one/a few parameters?

What can I use to identify cells that are actively expressing genes within a gene list?

Can I build and use pipelines for my analysis?

How do I classify cells?

My server is full, how do I make more space?

How do I add library files if I am not studying human or mouse?

Are Genome coordinates 1-based or 0-based?

Can I add transgenes to my reference files?

How do I export data from the result nodes?

Left click to select the data node you want to export. In the bottom of the task menu there will be an option to Download data.

Why don't you have RPM in your normalization?

RPM (reads per million) is the same as Total Count. Please use Total Count.

What is a canonical transcript?

For genes with multiple transcripts, one of the transcripts is picked as the canonical transcript. Based on the UCSC definition from the table browser,

knownCanonical - identifies the canonical isoform of each cluster ID, or gene. Generally, this is the longest isoform.

we define the canonical transcript as either the longest CDS (coding DNA sequence) if the gene has translated transcripts, or the longest cDNA.

Why are there decimal values in the Partek E/M quantification output?

If you need to remove the decimal points for downstream analysis outside of Partek Flow, you can round the values to the nearest integer.

Why is the number of variants listed in the variant reports and summarize cohort mutations report different?

Visualization

How do I order my heatmap by the cell types?

How do I display UMAP for each sample in the Data Viewer?

Can I visualize fold change values on a heatmap without using a z-score?

Why don't I see Flip mode on the heatmap? Why can't I download all of the data after zooming?

The Flip mode and download all data options are disabled if there are more than 2.5 million values (rows x columns) in the heatmap.

How to label gene names on volcano plot?

If you click any blank space, you can turn off select and use different selection mode button on the vertical bar on the upper-right corner of the plot to manually select dots on the plot.

Statistics

Why do I get "?" for FDR p-values in my Deseq2 result?

Click here for troubleshooting other differential analysis models and "?" results

What is fold change?

Log ratio option in Partek Flow is converted from ratio, this is a value comparable to log fold change in some other tools.

Can I label a Volcano plot with gene names?

In Volcano plot, what is inconclusive group mean?

What is the difference between FDR and FDR step up?

How to perform a paired t-Test in Flow

Can I compare one attribute at a time versus all of the others combined?

I downloaded gene counts from the output data node generated by the Quantify to annotation model task, why can't I find my genes of interest?

Biological Interpretation

What is the difference between GSEA and Gene Set Enrichment?

What is the enrichment score shown in the Gene Set Enrichment report?

In KEGG pathway, genes can be colored by Fold change and p-value etc, how are the gene statistics calculated?

For Gene set enrichment analysis, only genes from the input data node (filtered gene list) will be colored in the KEGG pathway gene network, using the statistics in the data node.

When should I use GSEA or Gene set ANOVA?

Quick Start Guide

Overview

This guide gives the basics of Partek Flow usage. Partek Flow can be installed in either a server, computer cluster or on the cloud. Regardless of where it's installed, it can be viewed using any web browser. We recommend using Google Chrome.

This guide covers:

Logging in to your Partek Flow account will bring up the Home page (Figure 1). This page will show recent activities you've performed, recent projects you've worked on and pertinent details about each project.

Starting a new project

Uploading your dataset

Select the type of data (Single cell, Bulk, Other), choose the assay type, and select the data format. Partek Flow accepts various data types. Use the Next button to proceed with import.

There are three ways you can upload the data:

From your Partek Flow server (click here for more information)
From a URL
From a GEO / ENA Bioproject (click here for more information)

Because genomics datasets are generally large, it is ideal to have the data copied in a folder directly accessible to the Partek Flow server. Make sure that the directory has the appropriate permissions for Partek Flow to read and write files in that folder. You may wish to seek assistance from your system administrator in uploading your data directly.

Select the files you would like to create samples from. Once they've been created, assign the corresponding sample attributes for each sample using the Metadata tab. The most efficient way to assign sample attributes is by clicking Assign sample attributes from a file and uploading a tab delimited text file. The file should contain a table with the following:

The first row lists the attribute names (e.g. Treatment, Exposure) and
The first column of the table lists the sample names (the sample names in the file must be identical to the ones listed in the Sample name column in the Data tab)
List the corresponding attributes for each sample in the succeeding columns

Basic Partek Flow layout

The Analyses Tab

After samples have been added and associated with valid data files, a data node will appear in the Analyses tab (Figure 3). The Analyses tab is where you can invoke tasks, using the context sensitive menu on the right, and view the results of your analysis.

To add more data use the Add data task in the menu on the right or Add data in the Metadata tab. Once a task is performed, data can no longer be added to the project.

Data and task nodes

The Analyses tab contains two elements: data nodes (circles) and task nodes (rounded rectangles) connected by lines and arrows . Collectively, they represent a data analysis pipeline (Figure 4).

Performing tasks

Clicking a data node brings up a context sensitive menu on the right (Figure 5). This menu changes depending on the type of data node. It will only present tasks which can be performed on that specific data type. Hover over the task to obtain additional information regarding each option.

Depending on the task, a new data node may automatically be created and connected to the original data node. This contains the data resulting from the task. Tasks that do not produce new data types, such as Pre-alignment QA/QC, will not produce an additional data node.

To view the results of a task, click the task node and choose the Task report option on the menu.

Saving visualizations

Downloading your data

Data associated with any data node can be downloaded by clicking the node and choosing Download data at the bottom of the task menu (Figure 8). Compressed files will be downloaded to the local computer where the user is accessing the Partek Flow server. Note that bigger files (such as unaligned reads) would take longer to download. For guidance, a file size estimate is provided for each data node. Downloaded files can be seamlessly imported in Partek® Genomics Suite®.

Partek Flow in action

Additional Assistance

Getting Started with Your Partek Flow Hosted Trial

Ready to start work on your Partek Flow hosted trial? This page has some helpful videos to get you started!

Uploading Your Data to a Hosted Instance of Partek Flow
How to Get Started on your First Project

Don't have a trial server yet? Request one on our website.

Uploading Your Data to a Hosted Instance of Partek Flow

This short video shows you how to import your data into a hosted instance of Partek Flow. Adjust your device's volume for optimal sound.

Note: When upload large size of data, it might take a while, please turn off the computer sleep mode settings!

How to Get Started on your First Project

In this short video, we'll give you an overview of the interface and how to get started with your analysis.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Installation Guide

Partek Flow is a web-based application for genomic data analysis and visualization. It can be installed on a desktop computer, computer cluster or cloud. Users can then access Partek Flow from any browser-enabled device, such as a personal computer, tablet or smartphone.

Read on to learn about the following installation topics:

Minimum System Requirements
Single Cell Toolkit System Requirements
Single Node Installation
Single Node Amazon Web Services‎ Deployment
Multi-Node Cluster Installation
Creating Restricted User Folders within the Partek Flow Server
Updating Partek Flow
Uninstalling Partek Flow
Dependencies
Docker and Docker-compose
Java KeyStore and Certificates
Kubernetes

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Minimum System Requirements

Web Browser Requirements

Regardless of whether Partek Flow is installed on a server or the cloud, users will be interacting with the software using a web browser. We support the latest Google Chrome, Mozilla Firefox, Microsoft Edge and Apple Safari browsers. While we make an effort to ensure that Partek Flow is robust, note that some browser plugins may affect the way the software is viewed on your browser.

Web Browser Requirements
Hardware Requirements (Single-node Linux)
Hardware Requirements (Cluster or Cloud)
Storage Recommendations

Hardware Requirements (Single-node Linux)

If you are installing Partek Flow on your own single-node server, we require the following for successful installation:

Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8 or later versions of these distributions
64-bit 2GHz quad-core processor1
48GB of RAM2
> 2TB of storage available for data
> 100GB on the root partition
A broadband internet connection

We support Docker-based installations. Please contact support@partek.com for more information.

1Note that some analyses have higher system requirements for example to run the STAR aligner on a reference genome of size ~3 GB (such as human, mouse or rat), 16 cores are required.

2Input sample file size can also impact memory usage, which is particularly the case for TopHat alignments.

Increasing hardware resources (cores, RAM, disk space, and speed) will allow for faster processing of more samples.

If you are licensed for the Single Cell Toolkit, please see Single Cell Toolkit System Requirements for amended hardware requirements.

Hardware Requirements (Cluster or Cloud)

Please contact Partek Technical Support if you would like to install Partek Flow on your own HPC or cloud account. We will assist in assessing your hardware needs and can make recommendations regarding provisioning sufficient resources to run the software.

Storage Recommendations

Proper storage planning is necessary to avoid future frustration and costly maintenance. Here are several DO's and DO NOT's:

DO:

Plan for at least 3 to 5 times more storage than you think is necessary. Investing extra funds in storage and storage performance is always worth it.
Keep all Flow data on a single partition that is expandable, such as RAID or LVM.
Back up your data, especially the Partek Flow database.

DO NOT:

Store data on 'removable' USB drives. Partek Flow will not be able to see these drives.
Store data across multiple partitions or folder locations. This will increase the maintenance burden substantially.
Use non-Linux file systems like NTFS.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Single Cell Toolkit System Requirements

Up to 100,000 cells per analysis
More than 100,000 cells per analysis

Because of the large size of single cell RNA-Seq data sets and the computationally-intensive tools used in single cell analysis, we have amended our system requirements and recommendations for installations of Partek Flow with the Single Cell toolkit.

Up to 100,000 cells per analysis

Required

Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 64 GB of RAM
Local scratch space*: 1 TB with cached or native speeds of 2GB/s or higher
Storage: > 2 TB available for data and > 100 GB on the root partition

More than 100,000 cells per analysis

Required

Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 256 GB of RAM
Local scratch space1: 2 TB with cached or native speeds of 2GB/s or higher
Storage: > 4 TB available for data

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Single Node Installation

This document describes how to set up and configure a single-node Partek Flow license.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Single Node Amazon Web Services Deployment

Creating a New Elastic Compute Cloud Instance for Partek Flow Software

Note: This guide assumes all items necessary for the Amazon elastic Comput Clout (EC2) instance does not exist, such as Amazon Virtual Private Cloud (VPC), subnets, and security groups, thus their creation is covered as well.

Click on EC2

Switch to the region intended to deploy Partek Flow software. This tutorial uses US East (N. Virginia) as an example.

On the left menu, click on Instances, then click the Launch Instance button. The Choose an Amazon Machine Image (AMI) page will appear.

Click the Select button next to Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-f4cc1de2. NOTE: Please use the latest Ubuntu AMI. It is likely that the AMI listed here will be out of date.

Choose an Instance Type, the selection depends on your budget and the size of the Partek Flow deployment. We recommend m4.large for testing or cluster front-end operation, m4.xlarge for standard deployments, and m4.2xlarge for alignment-heavy workloads with a large user-base. See the section AWS instance type resources and costs for assistance with choosing the right instance. In most cases, the instance type and associated resources can be changed after deployment, so one is not locked into the choices made for this step.

NOTE: New instance types will become available. Please use the latest mX instance type provided as it will likely perform better and be more cost effective than older instance types.

On the Configure Instance Details page, make the following selections:

Set the number of instances to 1. An autoscaling group is not necessary for single-node deployments
Purchasing Option: Leave Request Spot Instances unchecked. This is relevant for cost-minimization of Partek Flow cluster deployments.
Network: If you do not have a virtual private cloud (VPC) already created for Partek Flow, click Create New VPC. This will open a new browser tab for VPC management.
- Use the following settings for the VPC:
  - Name Tag: Flow-VPC
  - IPv4 CIDR block: 10.0.0.0/16
  - Select No IPv6 CIDR Block
  - Tenancy: Default
- Click Yes, Create. You may be asked to select a DHCP Option set. If so, then make sure the dynamic host configuration protocol (DHCP) option set has the following properties:
  - Options: domain-name = ec2.internal;domain-name-servers = AmazonProvidedDNS;
  - DNS Resolution: leave the defaults set to yes
  - DNS Hostname: change this to yes as internal DNS resolution may be necessary depending on the Partek Flow deployment
- Once created, the new Flow-VPC will appear in the list of available VPCs. The VPC needs additional configuration for external access. To continue, right click on Flow-VPC and select Edit DNS Resolution, select Yes, and then Save. Next, right click the Flow-VPC and select Edit DNS Hostnames, select Yes, then Save.
- Make sure the DHCP option set is set to the one created above. If it is not, right-click on the row containing Flow-VPC and select Edit DHCP Option Sets.
- Close the VPC Management tab and go back to the EC2 Management Console.
Click the refresh arrow next to Create New VPC and select Flow-VPC.
Click Create New Subnet and a new browser tab will open with a list of existing subnets. Click Create Subnet and set the following options:
- Name Tag: Flow-Subnet
- VPC: Flow-VPC
- VPC CIDRs: This should be automatically populated with the information from Flow-VPC
- Availability Zone: It is OK to let Amazon choose for you if you do not have a preference
- IPv4 CIDR block: 10.0.1.0/24
Stay on the VPC Dashboard Tab and on the left navigation menu, click Internet Gateways, then click Create Internet Gateway and use the following options:
- Name Tag: Flow-IGW
- Click Yes, Create
The new gateway will be displayed as Detached. Right click on the Flow-IGW gateway and select Attach to VPC, then select Flow-VPC and click Yes, Attach.
Click on Route Tables on the left navigation menu.
If it exists, select the route table already associated with Flow-VPC. If not, make a new route table and associate it with Flow-VPC. Click on the new route table, then click the Routes tab toward the bottom of the page. The route Destination = 10.0.0.0/16 Target = local should already be present. Click Edit, then Click Add another route and set the following parameters:
- Destination: 0.0.0.0/0
- Target set to Flow-IGW (the internet gateway that was just created)
Click Save
Close the VPC Dashboard browser tab and go back to the EC2 Management Console tab. Note that you should still be on Step 3: Configure Instance Details.

Click the refresh arrow next to Create New Subnet and select Flow-Subnet.

Auto-assign Public IP: Use subnet setting (Disable)

Placement Group: No placement group

IAM role: None.

Note: For multi-node Partek Flow deployments or instances where you would like Partek to manage AWS resources on your behalf, please see Partek AWS support and set up an IAM role for your Partek Flow EC2 instance. In most cases a specialized IAM role is unnecessary and we only need instance ssh keys.

Shutdown Behaviour: Stop

Enable Termination Protection: select Protect against accidental termination

Monitoring: leave Enable CloudWatch Detailed Monitoring disabled

EBS-optimized Instance: Make sure Launch as EBS-optimized Instance is enabled. Given the recommended choice of an m4 instance type, EBS optimization should be enabled at no extra cost.

Tenancy: Shared - Run a shared hardware instance

Network Interfaces: leave as-is

Advanced Details: leave as-is

Click Next: Add Storage. You should be on Step 4: Add Storage

For the existing root volume, set the following options:

Size: 8 GB
Volume Type: Magnetic
Select Delete on Termination
- Note: All Partek Flow data is stored on a non-root EBS volume. Since only the OS is on the root volume and not frequently re-booted, a fast root volume is probably not necessary or worth the cost. For more information about EBS volumes and their performance, see the section EBS volumes.

Click Add New Volume and set the following options:

Volume Type: EBS
Device: /dev/sdb (take the default)
Do not define a snapshot
Size (GiB): 500
- Note: This is the minimum for ST1 volumes, see: EBS volumes
Volume Type: Throughput optimized HDD (ST1)
Do not delete on terminate or encrypt

Click Next: Add Tags

You do not need to define any tags for this new EC2 instance, but you can if you would like.

Click Next: Configure Security Group

For Assign a Security Group select Create a New Security Group
Security Group Name: Flow-SG
Description: Security group for Partek Flow server
Add the following rules:
- SSH set Source to My IP (or the address range of your company or institution)
- Click Add Rule:
- Set Type to Custom TCP Rule
- Set Port Range to 8080
- Set Source to anywhere (0.0.0.0/0, ::/0)
  - Note: It is recommended to restrict Source to just those that need access to Partek Flow.

Click Review and Launch

The AWS console will suggest this server not be booted from a magnetic volume. Since there is not a lot of IO on the root partition and reboots are will be rare, choosing Continue with Magnetic will reduce costs. Choosing an SSD volume will not provide substantial benefit but it OK if one wishes to use an SSD volume. See the EBS Volumes section for more information.

Click Launch

Create a new keypair:

Name the keypair Flow-Key
Download this keypair, the run chmod 600 Flow-Key.pem (the downloaded key) so it can be used.
Backup this key as one may lose access to the Partek Flow instance without it.

The new instance will now boot. Use the left navigation bar and click on Instances. Click the pencil icon and assign the instance the name Partek Flow Server

Enabling External Access to the Partek Flow Elastic Compute Cloud Instance

The server should be assigned a fixed IP address. To do this, click on Elastic IPs on the left navigation menu from the EC2 Management Console.

Click Allocate New Address
Assign Scope to VPC
Click Allocate

On the table containing the newly allocated elastic IP, right click and select Associate Address

For Instance, select the instance name Flow Test Server
For Private IP, select the one private IP available for the Partek Flow EC2 instance, then click Associate

Note: For the remaining steps, we refer to the elastic ip as elastic.ip

SSH to the new Flow-Server instance:

Attaching the Amazon Elastic Block Store Volume for Partek Flow Data Storage

Attach, format, and move the ubuntu home directory onto the large ST1 elastic block store (EBS) volume. All Partek Flow data will live in this volume. Consult the AWS EC2 documentation for further information about attaching EBS volumes to your instance.

Note: Under Volumes in the EC2 management console, inspect Attachment Information. It will likely list the large ST1 EBS volume as attached to /dev/sdb. Replace "s" with "xv" to find the device name to use for this mkfs command.

Make a note of the newly created UUID for this volume

Copy the ubuntu home directory onto the EBS volume using a temporary mount point:

Make the EBS volume mount at system boot:

Add the following to /etc/fstab: UUID=the-UUID-from-the-mkfs-command-above /home ext4 defaults,nofail 0 2

Disconnect the ssh session, then log in again to make sure all is well

Installing Partek Flow on a New Elastic Compute Cloud Instance

Note: For additional information about Partek Flow installations, see our generic Installation Guide

Before beginning, send the media access control (MAC) address of the EC2 instance to MAC address of the EC2 instance to licensing@partek.com. The output of ifconfig will suffice. Given this information, Partek employees will create a license for your AWS server. MAC addresses will remain the same after stopping and starting the Partek Flow EC2 instance. If the MAC address does change, let our licensing department know and we can add your license to our floating license server or suggest other workarounds.

Install required packages for Partek Flow:

Install Partek Flow:

Note: Make sure you are running as the ubuntu user.

Partek Flow has finished loading when you see INFO: Server startup in xxxxxxx ms in the partek_flow/logs/catalina.out log file. This takes ~30 seconds.

Enter license key

Set up the Partek Flow admin account

Leave the library file directory at its default location and check that the free space listed for this directory is consistent with what was allocated for the ST1 EBS volume.

Done! Partek Flow is ready to use.

Partek Amazon Web Services Support

After the EC2 instance is provisioned, we are happy to assist with setting up Partek Flow or address other issues you encounter with the usage of Partek Flow. The quickest way to receive help is to allow us remote access to your server by sending us Flow-Key.pem and amending the SSH rule for Flow-SG to include access from IP 97.84.41.194 (Partek HQ). We recommend sending us the Flow-Key.pem via secure means. The easiest way to do this is with the following command:

We also provide live assistance via GoTo meeting or TeamViewer if you are uncomfortable with us accessing your EC2 instance directly. Before contacting us, please run $ ./partek_flow/flowstatus.sh to send us logs and other information that will assist with your support request.

General Recommendations

The network performance of the EC2 instance type becomes an important factor if the primary usage of Partek Flow is for alignment. For this use case, one will have to move copious amounts of data back (input fastq files) and forth (output bam files) between the Partek Flow server and the end users, thus it is important to have as what AWS refers to as high network performance which for most cases is around 1 Gb/s. If the focus is primarily on downstream analysis and visualization (e.g. the primary input files are ADAT) then network performance is less of a concern.

We recommend HVM virtualization as we have not seen any performance impact from using them and non-HVM instance types can come with significant deployment barriers.

Make sure your instance is EBS optimized by default and you are not charged a surcharge for EBS optimization.

T-class servers, although cheap, may slow responsiveness for the Partek Flow server and generally do not provide sufficient resources.

We do not recommend placing any data on instance store volumes since all data is lost on those volumes after an instance stops. This is too risky as there are cases where user tasks can take up unexpected amounts of memory forcing a server stop/reboot.

Amazon Web Services Instance Type Resources and Costs

Single server recommendation: m4.xlarge or m4.2xlarge

Elastic Block Store Volumes

Choice of a volume type and size:

This is dependent on the type of workload. For must users, the Partek Flow server tasks are alignment-heavy so we recommend a throughput optimized HDD (ST1) EBS volume since most aligner operations are sequential in nature. For workloads that focus primarily on downstream analysis, a general purpose SSD volume will suffice but the costs are greater. For those who focus on alignment or host several users, the storage requirements can be high. ST1 EBS volumes have the following characteristics:

Max throughput 500 MiB/s

$0.045 per GB-month of provisioned storage ($22.5 per month for a 500 GB of storage).

Additional Assistance

Multi-Node Cluster Installation

Partek Flow is a genomics data analysis and visualization software product designed to run on compute clusters. The following instructions assume the most basic setup of Partek Flow and must only be attempted by system administrators who are familiar with Linux-based commands. These instructions are not intended to be comprehensive. Cluster environments are largely variable, thus there are no 'one size fits all' instructions. The installation procedure on a computer cluster is highly dependent on the type of computer cluster and the environment it is located. We can to support a large array of Linux distributions and configurations. In all cases, Partek Technical Support will be available to assist with cluster installation and maintenance to ensure compatibility with any cluster environment. Please consult with Partek Licensing Support (licensing@partek.com) for additional information.

Prior to installation, make sure you have the license key related to the host-ID of the compute cluster the software will be installed in. Contact licensing@partek.com for key generation.

Installation on a Computer Cluster

Make a standard linux user account that will run the Partek Flow server and all associated processes. It is assumed this account is synced between the cluster head node and all compute nodes. For this guide, we name the account flow

Log into the flow account and proceed to the cd to the flow home directory

Download Partek Flow and the remote worker package

Unzip these files into the flow home directory /home/flow. This yields two directories: partek_flow and P_artekFlowRemoteWorker_
Partek Flow can generate large amounts of data, so it needs to be configured to the bulk of this data in the largest shared data store available. For this guide we assume that the directory is located at /shared. Adjust this path accordingly.
It is required that the Partek Flow server (which is running on the head node) and remote workers (which is running on the compute nodes) see identical file system paths for any directory Partek Flow has read or write access to. Thus /shared and /home/flow must be mounted on the Flow server and all compute nodes. Create the directory /shared/FlowData and allow the flow linux account write access to it
It is assumed the head node is attached to at least two separate networks: (1) a public network that allows users to log in to the head node and (2) a private backend network that is used for communication between compute nodes and the head node. Clients connect to the Flow web server on port 8080 so adjust the firewall to allow inbound connections to 8080 over the public network of the head node. Partek Flow will connect to remote workers over your private network on port 2552 and 8443, so make sure those ports are open to the private network on the flow server and workers.
Partek Flow needs to be informed of what private network to use for communication between the server and workers. It is possible that there are several private networks available (gigabit, infiniband, etc.) so select one to use. We recommend using the fastest network available. For this guide, let's assume that private network is 10.1.0.0/16. Locate the headnode hostname that resolves to an address on the 10.1.0.0/16 network. This must resolve to the same address on all compute nodes.
For example:

host head-node.local yields 10.1.1.200

Open /home/flow/.bashrc and add this as the last line:

Source .bashrc so the environment variable CATALINA_OPTS is accessible.

NOTE: If workers are unable to connect (below), then replace all hostnames with their respective IPs.

Start Partek Flow

You can monitor progress by tailing the log file partek_flow/logs/catalina.out. After a few minutes, the server should be up.
Make sure the correct ports are bound

You should see 10.1.1.200:2552 and :::8080 as LISTENing. Inspect catalina.out for additional error messages.
Enter the license key provided (Figure 1)

If there appears to be an issue with the license or there is a message about 'no workers attached', then restart Partek Flow. It may take 30 sec for the process to shut down. Make sure the process is terminated before starting the server back up:

Then run:

You will now be prompted to setup the Partek Flow admin user (Figure 2). Specify the username (admin), password and email address for the administrator account and click Next

Select a directory folder to store the library files that will be downloaded or generated by Partek Flow (Figure 3). All Partek Flow users share library files and the size of the library folder can grow significantly. We recommend at least 100GB of free space should be allocated for library files. The free space in the selected library file directory is shown. Click Next to proceed. You can change this directory after installation by changing system preferences. For more information, see Library file management.

To set up the Partek Flow data paths, click on Settings located on the top-right of the Flow server webpage. On the left, click on Directory permissions then Permit access to a new directory. Add /shared/PartekFlow and allow all users access.
Next click on System preferences on the left menu and change data download directory and default project output directory to /shared/PartekFlow/downloads and /shared/PartekFlow/project_output respectively

Note: If you do not see the /sharedfolder listed, click on the Refresh folder list link that is toward the bottom of the download directory dialog

Since you do not want to run any work on the head node, go to Settings>System preferences>Task queue and job processing and uncheck Start internal worker at Partek Flow server startup.
Restart the Flow server:

After 30 seconds, run:

This is needed to disable the internal worker.

Test that remote workers can connect to the Flow server
Log in as the flow user to one of your compute nodes. Assume the hostname is compute-0. Since your home directory is exported to all compute nodes, you should be able to go to /home/flow/PartekFlowRemoteWorker/
To start the remote worker:

These two addresses should both be in the 10.1.0.0/16 address space. The remote worker will output to stdout when you run it. Scan for any errors. You should see the message woot! I'm online.
A successfully connected worker will show up on the Resource management page on the Partek Flow server. This can be reached from the main homepage or by clicking Resource management from the Settings page. Once you have confirmed the worker can connect, kill the remote worker (CTRL-C) from the terminal in which you started it.
Once everything is working, return to library file management and add the genomes/indices required by your research team. If Partek hosts these genomes/indices, these will automatically be downloaded by Partek Flow

Integration with your queueing system

In effect, all you are doing is submitting the following command as a batch job to bring up remote workers:

The second parameter for this script can be obtained automatically via:

Bringing up workers

Bring up workers by running the command below. You only need to run one worker per node:

Shutting down workers

Go to the Resource management page and click on the Stop button (red square) next to the worker you wish to shut down. The worker will shut down gracefully, as in it will wait for currently running work on that node to finish, then it will shut down.

Updating Partek Flow

For the cluster update, you will get a link of .zip file for Partek Flow and remote Flow worker respectively from Partek support, all of the following actions should be performed as the Linux user that runs Flow. Do NOT run Flow as root.

Go to the Flow installation directory. This is usually the home directory of the Linux user that runs Flow and it should contain a directory named "partek_flow". The location of the Flow install can also be obtained by running ps aux | grep flow and examining the path of the running Flow executable.
Shut down Flow:

Download the new version of Flow and the Flow worker:

Make sure Flow has exited:

The flow process should no longer be listed.

Unpack the new version of Flow install and backup the old install:

Backup the Flow database folder. This should be located in the home directory of the user that runs Flow.

Start the updated version of Flow:

(make sure there is nothing of concern in this file when starting up Flow. You can stop the file tailing by typing: CTRL-C)

You may also want to examine the the main Flow log for errors:

Additional Assistance

Creating Restricted User Folders within the Partek Flow server

Partek Flow provides the infrastructure to isolate data from different users within the same server. This guide will provide general instructions on how to create this environment within Partek Flow. This can be modified to accommodate existing file systems already accessible to the server.

Go to Settings > Directory permissions and restrict parent folder access (typically /home/flow) to Administrator accounts only

Click the Permit access to a new directory button and navigate to the folder with your library files (typically /home/flow/FlowData/library_files). Select the All users (automatically updated) checkbox to permit all users (including those that will be added in the future) to see the library files associated with the Partek Flow server

Then go to System preferences > Filesystem and storage and set the Default project output directory to "Sample file directory"

Create your first user and select the Private directory checkbox. Specify where the private directory for that user is located

If needed, you can create a user directory by clicking Browse > Create new folder

This automatically sets browsing permissions for that private directory to that user

When a user creates a project. The default project output directory is now within their own restricted folder

More importantly, other users cannot see them

Add additional users as needed

Additional Assistance

Updating Partek Flow

Before performing updates, we recommend Backing Up the Database.

Updates are applied using the Linux package manager.

Make sure Partek Flow is stopped before updating it.

To update Partek Flow, open a terminal window and enter the following command.

For Debian/Unbuntu, enter:

For Redhat/Fedora/CentOS, enter:

For the YUM package manager, if updating Partek Flow fails with a message claiming "package not signed," enter:

Note that our packages are signed and the message above is erroneous.

For tomcat build update, download the latest version from below:

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Uninstalling Partek Flow

Linux

Open a terminal window and enter the following command.

Debian/Ubuntu:

RedHat/Fedora/CentOS:

The uninstall removes binaries only (/opt/partek_flow). The logs, database (partek_db) and files in the home/flow/.partekflow folder will remain unaffected.

MacOS

Stop and quit Partek Flow using the Partek Flow app in the menu.
Using Finder, delete Flow application from the Applications menu.

Missing image Figure 1. Control of Partek Flow through the menu bar

This process does not delete data or the library files. Users who wish to delete those can delete them using Finder or terminal. The default location of project output files and library files is the /FlowData directory under the user's home folder. However, the actual location may vary depending on your System or Project specific settings.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Dependencies

Select Library File Directory
CNVkit
DESeq2
HTSeq
MACS3
Python
R
Variant Effect Predictor
DECoN

Flow ships with tasks that do not have all of their dependencies included. On startup Flow will attempt to install the dependencies, but not every system is equipped to install them.

In the case of any difficulties, it is highly recommended to instead use a docker deployment (cluster installations may require singularity instead, which is somewhat still a work-in-progress)Z

CNVkit

Requires Python 2.7 or later.

On startup Flow will attempt to install additional python packages using the command

pip install --user cnvkit==0.9.5

Requires R 3.2.3 or later.

On startup Flow will attempt to install additional R packages.

There are cascading dependencies, but you can view the core libraries in partek_flow/bin/cnvkit-0.8.5/install.R

If these packages can't be built locally, it may be possible for the user to download them from us (see below).

DESeq2

Requires R 3.0 or later.

On startup Flow will attempt to install additional R packages.

There are cascading dependencies, but you can view the core libraries in partek_flow/bin/deseq_two-3.5/install.R

If these packages can't be built locally, it may be possible for the user to download them from us (see below).

RcppArmadillo may also have dependencies on multi-threading shared objects that may not be on the LD_LIBRARY_PATH

The recommendation is to copy those .so files to a folder and make sure it is available from the LD_LIBRARY_PATH when the server/worker starts.

Additional dynamic libraries (such as libxml2.so) may be missing and we can provide a copy appropriate for the target OS.

HTSeq

Requires Python 2.7 or 3.4 or above

On startup Flow attempts to install using pip

MACS3

Requires python 3.0 or above

pip install --user numpy==1.19.5 Cython==0.29.30 cykhash==2.0.0 macs3==3.0.0a7

Python

If there are any conflicts with preinstalled python packages, Flow should be configured to run with its own virtual environment:

pip install virtualenv

virtualenv ~/.partekflow/.local

source ~/.partekflow/.local/bin/activate

pip install HTSeq==0.11.0

pip install cnvkit==0.9.5

wget customer.partek.com/python-dependencies.zip

unizp -d ~/.partekflow/ python-dependencies.zip

R

R can usually be installed from the package manager. If the user installs Flow via apt or yum it should already be installed.

For older operating systems R is not available and will need to be installed from source

Currently, we offer a set of R packages compatible with some versions of R

Extract this file in the home directory. (Make .R a symlink if the home directory doesn't have enough free space)

These packages include the dependencies for both CNVkit and DESeq2

When running R diagnostic commands outside flow, it can simplify things if the environment includes a reference to the ~/.R folder:

export R_LIBS_USER=$HOME/.R

or load

.libPaths("~/.R")

in ~/.Rprofile

list loaded packages:

(.packages())

get the version:

packageVersion("packageName")

R_HOME=/path/to/R

Variant Effect Predictor

This is a compiled Perl script (so it has no direct dependency on Perl itself) we have had one report (istem.fr) of it failing to run.

DECoN

DECoN comes pre-installed in the flow_dna container registry.partek.com/flow_dna

Documentation on installing DECoN is available here: https://github.com/RahmanTeam/DECoN/blob/master/DECoN-v1.0.2.pdf

DECoN requires R version 3.1.2

It must be installed under /opt/R-3.1.2 or set the DECON_R environment variable to its folder

wget http://cran.wustl.edu/src/base/R-3/R-3.1.2.tar.gz

tar xfz R-3.1.2.tar.gz

cd R-3.1.2

./configure --with-x=no && make

Download DECoN

https://github.com/RahmanTeam/DECoN/archive/refs/tags/v1.0.2.zip

and install it under /opt/DECoN or set the DECON_PATH environment variable to its folder

You may need to add

symlink.system.packages: TRUE

to Linux/packrat/packrat.opts

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Docker and Docker-compose

Docker
Useful commands
Docker-compose

Docker

Docker can be used along Partek Flow to deploy an easy to maintain environment which will not have dependency issues and will be easy to relocate among different servers if needed.

One can follow the Docker documentation to install and get started.

“Docker is a platform for developers and sysadmins to build, run, and share applications with containers. The use of containers to deploy applications is called containerization. “

Useful commands

docker ps

This command will output the details of the currently running containers including port forwarding, container name/id, and uptime.

docker exec

This command will allow us to enter the running container’s environment to troubleshoot any issues we might have with the container. (the containers are not meant to be changed the correct way to deal with any issues is creating a new one after the troubleshot)

Docker-compose

“Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.“

Documentation

Partek will work with the customer to make a docker-compose file that will have all the configuration necessary to run Partek Flow on any machine that meets our Minimum system requirements.

Below it is an example of a docker-compose.yml file which can be used to bring a Partek Flow server with an extra worker.

These are some of the important tags shown above:

restart: whether you want the container to be restarted automatically upon failure or system restart.
image: the image we distributed and the desired version, even though we always recommend the users to run the latest version of the software if you need any specific versions of Partek Flow please visit here.
environment: here we set up any environment variables to be run along the container.
port: the default port to Partek Flow is 8080 and if you wish to change what port it will be accessible please change the first part (left to the colon) of 8080:8080. So if you wish to access the server on port 8888 then the correct format will be 8888:8080
mac_address: this needs to match your license file
volumes: in this section we specify the folder on the server to be shared with the container, this way we can better persist and access the files we create and use in the container.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Java KeyStore and Certificates

Java Keystore
Adding a certificate to the KeyStore

Java Keystore

JKS or Java KeyStore is used in Flow for some very specific scenarios where encryption is involved and there is a need for asymmetric encryption.

Partek Flow is shipped with a Java Keystore on its own, the file is found at .../partek_flow/distrib/flowkeystore where you may want to add your public and private certificates.

Adding a certificate to the KeyStore

If you already have a certificate please skip to the next step.

Create a certificate

Please place the key in a secure folder. (it is advisable to place in Flow's home directory. eg. /home/flow/keys

[~] openssl genrsa -out flow.key 2048

[~] openssl ecparam -genkey -name secp384r1 -out flow.key

[~] openssl req -new -x509 -sha256 -key flow.key -out flow.crt -days 3650

These commands above are meant to be used in a terminal. There are other ways to help you make a certificate but they will not going to be mentioned here.

If you wish to understand the flags used above please refer to the OpenSSL documentation.

Import a certificate into flowkeystore

For this step you will have to find where the cacerts file is located, it is under the Java installation, if you do not know how to do it contact us and we can help.

In the example the cacerts file is located at /usr/lib/jvm/java-11-openjdk-amd64/lib/security/cacerts

[~] keytool -import -file /home/flow/.partekflow/keys/flow.key -alias someName -keystore /usr/lib/jvm/java-11-openjdk-amd64/lib/security/cacerts -storepass changeit -noprompt

Tell the JVM where to find the key

We need to tell Partek Flow where the key is located, to do this we will edit a file which contains some of the Flow settings.

The file is usually located at /etc/partekflow.conf if you do not have this file we would advise to use the bashrc file from the system user that runs Partek Flow.

At the end of that file please add:

export CATALINA_OPTS="$CATALINA_OPTS -Djavax.net.ssl.trustStore=${HOME}/keys"

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Kubernetes

Below are the yaml documents which describe the bare minimum infrastructure needed for a functional Flow server. It is best to start with a single-node proof of concept deployment. Once that works, the deployment can be extended to multi-node with elastic worker allocation. Each section is explained below.

The Flow headnode pod

apiVersion: v1
kind: Pod
metadata:
  name: flowheadnode
  namespace: partek-flow
  labels:
    app.kubernetes.io/name: flowheadnode
    deployment: dev
spec:
  securityContext:
    fsGroup: 1000
  containers:
    - name: flowheadnode
      image: xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/partek-flow:current-23.0809.22
      resources:
        requests:
          memory: "16Gi"
          cpu: 8
      env:
        - name: PARTEKLM_LICENSE_FILE
          value: "@flexlmserver"
        - name: PARTEK_COMMON_NO_TOTAL_LIMITS
          value: "1"
        - name: CATALINA_OPTS
          value: "-DFLOW_WORKER_MEMORY_MB=1024 -DFLOW_WORKER_CORES=2 -Djavax.net.ssl.trustStore=/etc/flowconfig/cacerts -Xmx14g"
      volumeMounts:
        - name: home-flow
          mountPath: /home/flow
        - name: flowconfig
          readOnly: true
          mountPath: "/etc/flowconfig"
  volumes:
    - name: home-flow
      persistentVolumeClaim:
        claimName: partek-flow-pvc
    - name: flowconfig
      secret:
        secretName: flowconfig

Pod metadata

On a kubernetes cluster, all Flow deployments are placed in their own namespace, for example namespace: partek-flow. The label app.kubernetes.io/name: flowheadnode allows binding of a service or used to target other kubernetes infrastructure to this headnode pod. The label deployment: dev allows running multiple Flow instances in this namespace (dev, tst, uat, prd, etc) if needed and allows workers to connect to the correct headnode. For stronger isolation, running each Flow instance in its own namespace is optimal.

Data storage

The Flow docker image requires 1) a writable volume mounted to /home/flow 2) This volume needs to be readable and writable by UID:GID 1000:1000 3) For a multi-node setup, this volume needs to be cross mounted to all worker pods. In this case, the persistent volume would be backed by some network storage device such as EFS, NFS, or a mounted FileGateway.

This section achieves goal 2)

spec:
  securityContext:
    fsGroup: 1000

The flowconfig volume is used to override behavior for custom Flow builds and custom integrations. It is generally not needed for vanilla deployments.

The Flow docker image

Partek Flow is shipped as a single docker image containing all necessary dependencies. The same image is used for worker nodes. Most deployment-related configuration is set as environment variables. Auxiliary images are available for additional supporting infrastructure, such as flexlm and worker allocator images.

Official Partek Flow images can be found on our release notes page: Release Notes The image tags assume the format: registry.partek.com/rtw:YY.MMMM.build New rtw images are generally released several times a month. The image in the example above references a private ECR. It is highly recommended that the target image from registry.partek.com be loaded into your ECR. Image pulls will be much faster from AWS - this reduces the time to dynamically allocate workers. It also removes a single point of failure - if registry.partek.com were down it would impact your ability to launch new workers on demand.

Flow headnode resource request

Partek Flow uses the head node to handle all interactive data visualization. Additional CPU resources are needed for this, the more the better and 8 is a good place to start. As for memory, we recommend 8 to 16 GiB. Resource limits are not included here, but are set to large values globally:

# This allows us to create pods with only a request set, but not a limit set. Further tuning is recommended. 
apiVersion: v1
kind: LimitRange
metadata:
  name: partek-flow-limit-range
spec:
  limits:
    - max:
        memory: 512Gi
        cpu: 64
      default:
        memory: 512Gi
        cpu: 64
      defaultRequest:
        memory: 4Gi
        cpu: 2
      type: Container

Relevant Flow headnode environment variables

PARTEKLM_LICENSE_FILE

Partek Flow uses FlexLM for licensing. Currently we do not offer or have implemented any alternative. Values for this environment variable can be:

@flexlmserveraddress

An external flexlm server. We provide a Partek specific container image and detail a kubernetes deployment for this below. This license server can also live outside the kubernetes cluster - the only requirement being that it is network accessible. /home/flow/.partekflow/license/Partek.lic - Use this path exactly. This path is internal to the headnode container and is persisted on a mounted PVC.

Unfortunately, FlexLM is MAC address based and does not quite fit in with modern containerized deployments. There is no straightforward or native way for kubernetes to set the MAC address upon pod/container creation, so using a license file on the flowheadnode pod (/home/flow/.partekflow/license/Partek.lic ) could be problematic (but not impossible). In further examples below, we provide a custom FlexLM container that can be instantiated as a pod/service. This works by creating a new network interface with the requested MAC address inside the FlexLM pod.

PARTEK_COMMON_NO_TOTAL_LIMITS

Please leave this set at "1". Partek Flow need not enforce any limits as that is the responsibility of kubernetes. Setting this to anything else may result in Partek executables hanging.

CATALINA_OPTS

This is a hodgepodge of Java/Tomcat options. Parts of interest:

-DFLOW_WORKER_MEMORY_MB=1024 -DFLOW_WORKER_CORES=2

It is possible for the Flow headnode to execute jobs locally in addition to dispatching them to remote workers. These two options set resource limits on the Flow internal worker to prevent resource contention with the Flow server. If remote workers are not used and this remains a single-node deployment, meaning ALL jobs will execute on the internal worker, then it is best to remove the CPU limit (-DFLOW_WORKER_CORES) and only set -DFLOW_WORKER_MEMORY_MB equal to the kubernetes memory resource request.

-Djavax.net.ssl.trustStore=/etc/flowconfig/cacerts

If Flow connects to a corporate LDAP server for authentication, it will need to trust the LDAP certificates.

-Xmx14g

JVM heap size. If the internal worker is not used, set this to be a little less than the kubernetes memory resource request. If the internal worker is an use, and the intent is to stay with a single-node deployment, then set this to be ~ 25% of the kubernetes memory resource request, but no less than ~ 4 GiB.

The Flow headnode service definition

apiVersion: v1
kind: Service
metadata:
  name: flowheadnode
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
      name: http
    - port: 2552
      targetPort: 2552
      protocol: TCP
      name: akka
    - port: 8443
      targetPort: 8443
      protocol: TCP
      name: licensing
  selector:
    app.kubernetes.io/name: flowheadnode

The flowheadnode service is needed 1) so that workers have a DNS name (flowheadnode) to connect to when they start and 2) so that we can attach an ingress route to make the Flow web interface accessible to end users. The app.kubernetes.io/name: flowheadnode selector is what binds this to the flowheadnode pod.

80:8080 - Users interact with Flow entirely over a web browser
2552:2552 - Workers communicate with the Flow server over port 2552
8443:8443 - Partek executed binaries connect back to the Flow server over port 8443 to do license checks

Ingress to flowheadnode

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: flowheadnode
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  rules:
    - host: flow.dev-devsvc.domain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: flowheadnode
                port:
                  number: 80

This provides external users HTTPS access to Flow at host: flow.dev-devsvc.domain.com Your details will vary. This is where we bind to the flowheadnode service.

The flexlm service pod

# On a NEW deployment, you need to exec into this pod and add the license file
# to /usr/local/flexlm/licenses
# After a license file is present, the flexlm daemon will start automatically

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: flexlmserver-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi     # flex.log is the only thing that slowly grows here
  storageClassName: gp2-ebs-sc
  volumeMode: Filesystem
---
apiVersion: v1
kind: Service
metadata:
  name: flexlmserver
spec:
  type: ClusterIP
  ports:
    - port: 27000
      targetPort: 27000
      protocol: TCP
      name: flexmain
    - port: 27001
      targetPort: 27001
      protocol: TCP
      name: flexvendor
  selector:
    app.kubernetes.io/name: flexlmserver
---
apiVersion: v1
kind: Pod
metadata:
  name: flexlmserver
  namespace: partek-flow
  labels:
    app.kubernetes.io/name: flexlmserver
spec:
  containers:
    - name: flexlmserver
      image: public.ecr.aws/partek-flow/kube-flexlm-server
      ports:
        - containerPort: 27000
        - containerPort: 27001
      resources:
        limits:
          memory: "256Mi"
          cpu: 1
      securityContext:
        capabilities:
          add: ["NET_ADMIN"]
      volumeMounts:
        - name: flexlmserver-pvc
          mountPath: /usr/local/flexlm/licenses
  volumes:
    - name: flexlmserver-pvc
      persistentVolumeClaim:
        claimName: flexlmserver-pvc

The yaml documents above will bring up a complete Partek-specific license server.

Note that the service name is flexlmserver. The flowheadnode pod connects to this license server via the PARTEKLM_LICENSE_FILE="@flexlmserver" environment variable.

You should deploy this flexlmserver first, since the flowheadnode will need it available in order to start in a licensed state.

Partek will send a Partek.lic file licensed to some random MAC address. When this license is (manually) written to /usr/local/flexlm/licenses, the pod will continue execution by creating a new network interface using the MAC address in Partek.lic, then it will start the licensing service. This is why the NET_ADMIN capability is added to this pod.

The license from Partek must contain VENDOR parteklm PORT=27001 so the vendor port remains at 27001 in order to match the service definition above. Without this, this port is randomly set by FlexLM.

This image is currently available from public.ecr.aws/partek-flow/kube-flexlm-server but this may change in the future.

Live Training Event Recordings

Here you will find videos of past live training webinars.

Bulk RNA-Seq Analysis Training
Basic scRNA-Seq Analysis & Visualization Training
Advanced scRNA-Seq Data Analysis Training
Bulk RNA-Seq and ATAC-Seq Integration Training
Spatial Transcriptomics Data Analysis Training
scRNA and scATAC Data Integration Training

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Bulk RNA-Seq Analysis Training

Webinar Breakdown

Topics have been broken out into bite-size videos for your convenience.

Basic scRNA-Seq Analysis & Visualization Training

Webinar Breakdown

Topics have been broken out into bite-size videos for your convenience.

Advanced scRNA-Seq Data Analysis Training

Webinar Breakdown

Topics have been broken out into bite-size videos for your convenience.

Bulk RNA-Seq and ATAC-Seq Integration Training

Webinar Breakdown

Topics have been broken out into bite-size videos for your convenience.

Spatial Transcriptomics Data Analysis Training

Webinar Breakdown

Topics have been broken out into bite-size videos for your convenience.

scRNA and scATAC Data Integration Training

Webinar Breakdown

Topics have been broken out into bite-size videos for your convenience.

Tutorials

Partek Flow tutorials provide step-by-step instructions using a supplied data set to teach you how to use the software tools. Upon completion of each tutorial, you will be able to apply your knowledge in your own studies.

Creating and Analyzing a Project
Bulk RNA-Seq
Analyzing Single Cell RNA-Seq Data
Analyzing CITE-Seq Data
10x Genomics Visium Spatial Data Analysis
10x Genomics Xenium Data Analysis
Single Cell RNA-Seq Analysis (Multiple Samples)
Analyzing Single Cell ATAC-Seq data

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Creating and Analyzing a Project

Partek Flow software manages separate experiments as projects. A complete project consists of input data, tasks used to analyze the data, the resulting output files, and a list of users involved in the analysis.

This chapter provides instructions in creating and analyzing a project and covers:

Creating a New Project
The Metadata Tab
The Analyses Tab
The Log Tab
The Project Settings Tab
The Attachments Tab
Project Management
Importing a GEO / ENA project

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Creating a New Project

Using a web browser, log in to Partek Flow. From the Home page click the New Project button; enter a project name (Figure 1) and then click Create project.

The Project name is the basis of the default name of the output directory for this project. Project names are unique, thus a new project cannot have the same name as an existing project within the same Partek Flow server.

Once a new project has been created, the user is automatically directed to the Analysis tab of the Project View (Figure 2).

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

The Metadata Tab

The Partek Flow Metadata Tab has an option to import data, and is where sample/cell attributes are managed. This is also where users can modify the location of the project output folder.

Import data
Project output directory
Sample Annotation
Deleting or Renaming samples within a Project

Import data

The Metadata tab can be used to import data. To add samples to the project, click Add data under Import, different import options are displayed using the cascading menu (Figure 1).

Automatically create samples from files

This method adds samples by creating them simultaneously as the data gets imported into a project. The sample names are assigned automatically based on filenames.

Before proceeding, it is ideal that you have already transfer the data you wish to analyze in a folder (with appropriate permissions) within the Partek Flow server. Please seek assistance from your system administrator in uploading your data directly.

Select the Automatically create samples from files button. The next screen will feature a file browser that will show any folders you have access to in the Partek Flow server (Figure 2). Select a folder by clicking the folder name. Files in the selected folder that have file formats that can be imported by Partek Flow will be displayed and tick-marked on the right panel. You can exclude some files from the folder by unselecting the check mark on the left side of the filename. When you have made your selections, click the Create sample button.

Alternatively, files can also be uploaded and imported into the project from the user's local computer -only use this option if your file size is less than 500MB. Select the My computer radio button (Figure 3) and the options of selecting the local file and the upload (destination) directory will appear. Only one file at a time can be imported to a project using this method.

Multiple data files can be compressed a single .zip file before uploading. Partek Flow will automatically unzip the files and put them in the upload directory.

Please be aware that the use of the method illustrated in Figure 3 highly depends on the speed and latency of the Internet connection between the user's computer and the server where Partek Flow is installed. Given the large size of most genomics data sets, is not recommended in most cases.

After successful creation of samples from files, the Data tab now contains a Sample management table (Figure 4). The Sample name column in the table is automatically generated based on the filenames and the table is sorted in alphabetical order.

Clicking the on the** Show data files** link on the lower right side of the sample management table will expand the table and reveal the filenames of the files associated with each sample. Conversely, clicking on Hide data files will hide the file information.

The columns in the expanded view show the files associated with each sample. Files are organized by file type. Any filename extensions that indicate compression (such as .gz) are not shown.

Once a sample is created in a project, the files associated with it can be modified. In the expanded view, mouse over the +/- column of a sample. The highlighted icons will correspond to the options for the sample on that row.

Create a new blank sample

Samples can be added one at a time by selecting the Create a new blank sample option (Figure 5). In the following dialog box, type a sample name and click Create. This process creates a sample entry in the sample management table but there is no associated file with it, hence it is a "blank sample."

Expanding the Sample management table by clicking Show data files on the lower left corner of the table will reveal the option to associate files to the blank sample.

Importing count matrix data

Alternatively, if you have a matrix of data, such as raw read count data in text format, select Import count matrix. The requirements of this text file are listed below:

The file contains numeric values in a tab-delimited format, samples can be on rows while features (e.g.gene names) are in columns, or vice versa
The file contains unique sample IDs and feature IDs
If the data contains sample attribute information, all these attributes have to be ether
- The leftmost columns when samples are on rows (Figure 6)
- The first few rows when samples are on columns (Figure 7)

Like all other input files, you can upload the file from the Partek Flow server, My Computer or via a URL. Uploading the file brings up a file preview window (Figure 8). The preview of the first few rows and columns of the text file should help you determine on which rows/columns the relevant counts are located (the preview will display up to 100 rows and 100 columns). Inspect the text preview and indicate the orientation of the text file under File format>Input format.

If the read counts are based on a compatible annotation file in Partek Flow, you can specify that annotation file under Gene/feature annotation. Select the appropriate genome build and annotation model for your count data. Select the Contain sample attributes checkbox if your data includes additional sample information.

The example above is showing an example text file with samples listed on rows. The gene ID is compatible with the hg19 RefSeq Transcripts - 2016-08-01 annotation model. Under the Column information and Row information sections, indicate the location of the Sample ID, which in this case is on Column 1. Indicate the sample attribute location by marking where it starts, which in the example is at Column 2. Mark the Feature ID, which in this case are gene IDs and starts at Column 4.

If the data has been log transformed, specify the base under Counts format.

Project output directory

The project output directory is the folder within the Partek Flow server where all output files produced during analysis will be stored.

The default directory is configured by the Partek Flow Administrator under the Settings menu (under System Preferences > Default project output directory).

If the user does not override the default, the task output will go to a subdirectory with the name of the Project.

Sample Annotation

After samples have been added in the project, additional information about the samples can be added. Information such as disease type, age, treatment, or sex can be annotated to the data by assigning the Attributes for each sample.

Certain tasks in Partek Flow, such as Gene-Specific Analysis, require that samples be assigned attributes in order to do statistical comparisons between groups or samples. As attributes are added to the project, additional columns in the sample management table will be created.

Sample attributes

Attributes can be managed or created within a project. Under the Data tab, click the button to open the Manage attributes page (Figure 9).

To prepare for later data analysis using statistical tools, attributes can either be categorical or numeric (i.e., continuous).

Adding a categorical attribute

For categorical attributes, there are two levels of visibility. Project-specific categorical attributes are visible only within the current project. System-wide categorical attributes are visible across all the projects within the Partek Flow server, and are useful for maintaining uniformity of terms. Importing samples in a new project will retain the system-wide attributes, but not the project-specific attributes.

A feature of Partek Flow is the use of controlled vocabulary for categorical attributes, allowing samples to be assigned only within pre-defined categories. It was designed to effectively manage content and data and allow teams to share common definitions. The use of standard terms minimizes confusion.

To add a categorical attribute in the Manage attributes page, click the Add new attribute (Figure 10). In the dialog box, type a Name for the attribute, select the Categorical radio button next to Attribute type, select the visibility of the attribute and then click the Add button.

Repeat the process for additional attributes of the samples in your study. When done, click Back to sample management table. Categorical attributes will default to Project-specific visibility.

Click an attribute name to drag and drop can change the order of the attributes displayed on the data tab. Click on group name to drag and drop vertically can change the order of the group name, which can be reflected on visualization.

Adding a numeric attribute

To add a numeric attribute in the Manage attributes page, click the Add new attribute. In the dialog box (Figure 13), type a Name for the attribute, select the Numeric radio button next to Attribute type, and then click the Add button. Some optional parameters for numeric attributes include the Minimum value, Maximum value, and Units. When done, click Add to return to the Manage attributes page. Repeat the process add more numeric attributes. When done, click Back to sample management table.

Adding a system-wide attribute

Since system-wide attributes do not have to be created by the current user, they only need to be added to the sample management table in a project.

In the Data tab, click Add a system-wide attribute button. In the dialog box that follows (Figure 14), a drop down menu is located next to Add attribute where you can select the System-wide attribute you would like to add to the project. Once selected, it will be recognized automatically as either Categorical, system-wide or the Numeric attribute.

For an System-wide categorical attribute, the different categories are listed and you have the option of pre-filling the columns with N/A (or any other category within the attribute). Click Add column and you will return to the Data Tab.

Assigning categories or values to attributes

After adding all the desired attributes to a project, the sample management table will show a new column for each attribute (Figure 15). The columns will initially as "N/A", as the samples have not yet been categorized or assigned a value. To edit the table, click Edit attributes. Assign the sample attributes by using a drop down for categorical attributes (controlled vocabulary) or typing with a keyboard for numeric attributes.

When all the attributes have been entered, click Apply changes and the sample management table will be updated. After editing the sample table, make sure there are no fields with blank or N/A values before proceeding. To rename or delete attributes, click Manage attributes from the Data tab to access the Manage attributes page.

Assigning attributes using a Sample Annotation Text File

Another way to assign attributes to samples in the Data tab is to use a text file that contains the table of attributes and categories/values. This table is prepared outside of Partek Flow using any text editing software capable of saving tab-separated text files.

Using a text editor, prepare a table containing the attributes. An example is shown in Figure 16. There should only be one tab between columns with no extra tabs after the last column. In this particular example, the first column contains the filename and the text file is saved as Sampleinfo.txt.

The first row of the table in the text file contains the attributes (as headers). The first column of the table in the text file, regardless of the header of the first column, should contain either the sample names or the file names of the samples already added in Partek Flow. The first column is the unique identifier that will match the samples to the correct values or categories.

To upload sample attributes, click Assign sample attributes from a file in the Data tab. Then indicate where the attribute text file is stored and navigate to it. Partek Flow will parse the text file and present attributes that will be available for import (Figure 17).

Select the attributes you want to import by clicking the Import check box. Imported attributes that do not currently exist in the project will create new project-specific attributes.

You can change the name of a specific attribute by editing the Attribute name text box. Columns containing letter characters are automatically selected as categorical attributes. Columns containing numbers are suggested to be numeric attributes and can be changed to categorical using the drop down menu under Attribute type.

Guidelines for preparing the sample annotation text file

The first column is always the unique identifier and can refer only to File names or Sample names.
If using Sample names in the first column, they must match the entries of the Sample name column in the Sample management table.
If using File names in the first column, use the filenames shown in the fastq column of the expanded sample management table (see Figure 4) then add the extension .gz. All filenames must include the complete file extension (e.g., Samplename.fastq.gz).
The header name of the first column of the table (top left cell of our text table) is irrelevant but should not be left blank. Whether the first column contains File names or Sample names will be chosen during the process.
The last column cannot have empty values
Missing data (blank cells) can only be handled if the attribute is numeric. If it is categorical, please put a character in it.

It is advisable to use Sample name as the first column identifier when:

Samples are associated with more than one file (for instance, paired-end reads and/or technical replicates).
The files were imported in the SRA format (from the Sequence Read Archive database). In Partek Flow, they are automatically converted to the FASTQ format. Consequently, their filenames would change once they are imported. The new file names can be seen by expanding the sample management table, the new extension would be .fastq.gz.

If attributes are assigned from two different text files, the following will happen:

If the previous attributes have the same header and type (both are either categorical or numeric), the values are overwritten.
If there are different/additional headers on the "second round" of assignment, these new attributes will be appended to the table.
For numeric attributes, a "blank" value will not override a previous value.

Use of attributes as Optional columns in task report tables

The attributes assigned to the samples within the Data Tab will be associated with the samples throughout the project. During the course of analysis, Partek Flow tasks generate various tables and any attributes associated with a sample can be included in the table as optional columns. An example is shown in Figure 18 for a pre-alignment QA/QC report where the Optional columns link on the top left of the table reveal the different sample attributes.

Deleting or Renaming samples within a Project

In the Data tab, each sample can be renamed or deleted from the project by clicking the gear icon next to the sample name. The gear icon is readily visible upon mouse over (Figure 19). Sample can only be deleted if no analysis has been performed on the data yet. If any analysis has been performed on the data node, then delete sample operation is invisible. You can perform filter samples in downstream analysis if you want to exclude certain samples in further analysis. Deleting a sample from a project does not delete the associated files, which will remain on the disk.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

The Analyses Tab

After samples have been added and associated with valid data files, in a Partek Flow project, a data node will appear in the Analyses tab (Figure 1). The Analyses tab is where different analysis tools and the corresponding reports are accessed.

Data and Task Nodes
The Context Sensitive Menu
Running a Task
Cancelling and Deleting Tasks
Task Results and Task Actions
Layers
Collapsing Tasks
Downloading Data

Data and Task Nodes

The Analyses tab contains two elements: data nodes (circles) and task nodes (rounded rectangles) connected by lines and arrows. Collectively, they represent a data analysis pipeline.

Data nodes (Figure 2) may represent a file imported into the project, or a file generated by Partek Flow as an output of a task (e.g., alignment of FASTQ files generates BAM files).

Missing image Figure 2. The Analyses tab showing a data node of unaligned reads

Task nodes (Figure 3) represent the analysis steps performed on the data within a project. For details on the tasks available in Partek Flow, see the specific chapters of this user manual dedicated to the different tasks.

Clicking on a node reveals the context sensitive menu, on the right side of the screen.

Only the tasks that are available for the selected data node will be listed in the menu. For data nodes, actions that can be performed on that specific data type will appear.

In Figure 4, a node that contains Unaligned reads is selected (bold black line). The tasks listed are the ones that can be performed on unaligned data (QA/QC, Pre-analysis tools, and Aligners).

After a task is performed on a data node, a new task node is created and connected to the original data node. Depending on the task, a new data node may automatically be generated that contains the resulting data. For details of individual tasks, see Task Menu.

In Figure 5, alignment was performed on the unaligned reads. Two additional nodes were added: a task node for Align reads and an output data node containing the Aligned reads.

Running a Task

To run a task, select a data node and then locate the task you wish to perform from the context sensitive menu. Mouse over to see a description of the action to be performed. Click the specific task, set the additional parameters (Figure 6), and click Finish. The task will be scheduled, the display will refresh, and the screen will return to the project's Analyses tab.

In Figure 6, the STAR aligner was selected and the choices for the aligner index and additional alignment options appeared.

Tasks that are currently running (or scheduled in the queue) appear as translucent nodes. The progress of the task is indicated by the progress bar within the task node. Hovering the mouse pointer over the node will highlight the related nodes (with a thin black outline) and display the status of the task (Figure 7).

If a task is expected to generate data nodes, expected nodes appear even before the task is completed. They will have a lighter shade of color to indicate that they have not yet been generated as the task is still being performed. Once all tasks are done, all nodes would appear in the same shade.

Cancelling and Deleting Tasks

Tasks can only be cancelled by the user that started the task or by the owner of the project. Running or pending tasks can be canceled by clicking the right mouse button on the task node and then selecting Cancel (Figure 8). Alternatively, the task node may be selected and the Cancel task selected from the context sensitive menu.

A verification dialog will appear (Figure 9) asking to confirm the task cancellation, the cancelled tasks will remain in the Analyses tab but will be flagged by gray x circles on the nodes (Figure 10).

Data nodes connected to incomplete tasks are also incomplete as no output can be generated (Figure 10).

To delete tasks from the project click the right mouse button on the task node and then select Delete (Figure 11). Alternatively, click the task node and select Delete task from the context sensitive menu.

Task Results and Task Actions

Selecting a task node will reveal a menu pane with two sections: Task results and Task actions (Figure 13).

Items from the Task results section inform on the action performed in that node. Certain tasks generate a Task report (Figure 14), which include any tables or charts that the task may have produced.

The Task details shows detailed diagnostic information about the task (Figure 15). It includes the duration and parameters of the task, lists of input and output files, and the actual commands (in the command line) that were run.

Additionally, the Task details page would contain the error logs of unsuccessful runs. The user can download the logs or send them directly to Partek. This page plays an important role in diagnosing and troubleshooting issues related to task.

Double clicking on a task node will show the Task report page. However, if no report was generated, the user will be directed to the Task details page.

In the Task actions sections, the selected task can be Re-run w/new parameters, and in case it is part of a pipeline that includes additional tasks after it, running the Downstream tasks is an option. Re-running tasks will result in a new layer being made in the Analyses tab.

Another action available for a task node is Add task description (Figure 16), which is a way to add notes to the project. The user can enter a description, which will be displayed when the mouse pointer is hovered over the task node.

Layers

It is common for next-generation sequencing data analysis to examine different task parameters for optimization. Users may want to modify an upstream step (e.g. alignment parameters) and inspect its effect on downstream results (e.g. percent aligned reads).

The implementation of Layers makes optimizations easy and organized. Instead of creating separate nodes in a pipeline, another set of nodes with a different color is stacked on top of previous analyses (Figure 17). To see the parameters that were changed between runs, hover the mouse icon over the set of stacked task nodes and a pop-up balloon will display them. The text color signifies the layer corresponding to a specific parameter.

Layers are formed when the same task is performed on the same data node more than once. They are also formed when a task node is selected and the Re-run it w/new parameters is selected in the context sensitive menu. This will allow the users to change the options only for the selected task. The user may choose to re-run the task to which the changes have been made, as well as all the downstream tasks until the analysis is completed. To do so, select Re-run w/new parameters, downstream tasks from the context sensitive menu.

To select a different layer, use the left mouse button to click on any node of the desired layer. All the nodes associated with the selected layer have the same color and when clicked will be displayed on the top of the stack.

Collapsing Tasks

Addition of task and resulting data nodes to project may lead to creation of long pipelines, extending well beyond the border of the canvas (Figure 18).

In that case, the pipeline can be collapsed, to hide the steps that are (no longer) relevant. For example, once the single-cell RNA-seq data has been quantified, Single cell counts data node will be a new analysis start point, as the subsequent analyses will not focus on alignment, UMI deduplication etc. To start, right-click on the task node which should become a boundary of the collapsed portion of your pipeline and select Collapse tasks (Figure 19).

All the tasks on that layer will turn purple. Then left-click the task which should be the other boundary of the collapsed portion. All the tasks that will be collapsed will turn green and a dialog will appear (Figure 19). In the example shown in Figure 19, the tasks between Trim tags and Quantify barcodes will be collapsed. Give the collapsed section a name (up to 30 characters) and select Save (Figure 20).

The collapsed portion of the pipeline is replaced by single task node, with a custom label ("Single cell preprocessing"; Figure 21).

To re-expand the pipeline double click on the task node representing the collapsed portion of the pipeline. Alternatively, single click on the node and select Expand... on the context-sensitive menu. Within the same menu, you can also preview the contents of the collapsed task by selecting View... (Figure 22).

Downloading data

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

The Log Tab

The Log tab contains a table of the tasks that are running, scheduled, or those that have been completed within the Partek Flow (Figure 1). It provides an overview of the task progress, enables task management, and links to detailed reports for each task.

Each row of the table corresponds to a task node in the Analyses tab. The list can be sorted according to a specific column using the sort icon .

The Task column lists the name of the tasks. On the left of the task name is a colored circle indicating the layer of this task. The column is searchable by task name. Clicking the task name will open the Task report page. If the task did not generate a report, the link will go to the Task details page.

The User column identifies the task owner. Aside from the user who created the project, only collaborators and users with admin privileges can start tasks in a project. Clicking a name in the User column will display the corresponding User profile.

The End column shows when the task was completed. It will show the actual time for completed tasks, and the estimated time for running tasks. These estimates improve in accuracy as more tasks are completed in the current Partek Flow instance.

The Status column displays the current status of the task, such as Waiting, Running, Done, Canceled. If the task is currently running, a status progress bar will appear in the column. Once completed, the status of a task will be Done and the End column will be updated with the completion time.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

The Project Settings Tab

Project details
Members

Project details

The Project details section shows the Name of the project as well as (optional) a textual project Description and a Thumbnail (picture) (Figure 1).

The owner and collaborators (if any) can customize the Description and Thumbnail entries by pushing the orange Edit project details button (Figure 2). The fields can now be edited to:

Rename the project (names are limited to 30 characters). The original project Name is the one selected when creating the project
Add or change a project description (up to 2000 characters)
Add or change a thumbnail of the project (supported formats are .jpg, .bmp, .gif, .png; the maximum size of the image file is 10 MB)

The Description accepts hyperlinks starting with "http://" or "https://" and if selected, will open a new tab browser to navigate to the website. This description will be also displayed to collaborators and administrators on the Partek® Flow® Home Page. Choose File button launches a file browser showing directory structure of the local computer from which the thumbnail image file will be uploaded. Alternatively, Clear thumbnail button removes the current thumbnail.

Once all the edits have been made, push Save to accept (or Cancel to reject).

If a thumbnail has been added, it will appear on the Project details tab (Figure 3) and on the home page of Partek Flow, on the Details tab of the project.

Members

The Members section provides an overview of users associated with a particular project and enables project creators (owners) and administrators to add collaborators (Figure 1). A user (without administrator status) has to be specified as a collaborator in a project to be able to access the project in his/her home folder and to perform tasks.

Pushing the pencil icon (Pencil icon](../../../.gitbook/assets/pencil-icon.png)) by a project member can result in two dialogs, depending on the status of the member. For a collaborator or a viewer, the pencil icon changes the member's role (e.g. from a Viewer to a Collaborator) (Figure 4).

Moreover, the project owner can transfer the ownership to another user account (one of the accounts already available at the current instance of Partek Flow) using the New owner dropdown list (Figure 5). The previous (old) owner can remain as a project collaborator, with the help of the matching option.

If email notifications are turned on for project ownership transfer, an email dialog box appears. This can be used to add additional text to the notification email body (Figure 6).

The Attachments Tab

The Attachments tab allows the project owner to add external (i.e. non-Partek Flow) files to a project (for instance, spreadsheets, word documents, manuscripts). To attach a file, go to the Attachments tab (Figure 1). Choose File button invokes the file browser showing the directory structure of the local computer. Select the file that you want to attach and then click on the Upload attachment button. For security reasons, Partek Flow will not allow you to add an executable file.

All added files will be listed in the table under the Attachments tab (Figure 2). The tab will also display file sizes, the user name of the person who uploaded the file and the time it was uploaded. Note that uploaded files will count towards the total size of the project, and thus, if available, to the disk quota of the project owner.

To remove a file, click the icon. To download the attachment, click the icon.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Project Management

Project Deletion

A project may be deleted from the Partek Flow server using the button on the upper right side of the Project View page (Figure 1).

Selecting Files for Deletion

Project Import and Export

Every project can be exported before it is removed from the server. By exporting old projects, you can free up some storage on your server. You can import the exported project back into Partek Flow later if needed.

Exporting a Project

When you export a project, you will be asked whether to include library files to export or not. If you choose Yes, the current version of library files used in the project will be archived and you can reproduce the result when you later import the project and re-do the analysis. However, it will make the archive size bigger. If you choose No, the library files will not be exported. Note that when you import the project later, you can only use the available version of needed library files to re-do the same analysis, and the results might not be the same.

Importing a Project

The Import project option is under Projects drop-down menu on the top of the Partek Flow page (Figure 5). This can be accessed on any Partek Flow page.

The input of this option is the zipped file of the exported project. Browse to the file location which can either be the Partek Flow server, a local machine, or a URL. The zip file first needs to be uploaded to the Partek Flow server (if it is not on the server already), and then Partek Flow will unpack the zip file into a project. The project name will be the same as the exported project name. If the project with the same name already exists, the imported project will have a number appended to it (e.g., ProjectName_1).

The owner of the imported project will be the user that imported it.

Additional Assistance

Importing a GEO / ENA project

How to import a study from GEO / ENA

If a project is publicly available in the Gene Expression Omnibus (GEO) and European Nucleotide Archive (ENA) databases, you can import associated FASTQ files, sample attributes, and project details automatically into Partek Flow.

Click Projects at the top of the page
Click Import project

Choose GEO / ENA project for Select files from
Type the BioProject ID or the GEO Accession number

The format of a BioProject ID is PRJNA followed by one to six numbers (e.g., PRJNA291540). The format of a GEO Accession number is GSE followed by one to five numbers (e.g., GSE71578).

Click Import project at the bottom

The Analyses tab will include an Unaligned reads data node once the data download has started (Figure 3). It may take a while for the download to complete depending on the size of the data. FASTQ files are downloaded from the ENA BioProject page.

Common Issues

Error Message - The project did not yield any data. Double-check the project ID, or try importing the data manually

If the study is not publicly available in both GEO and ENA, project import will not succeed.

The project was imported, but the Analyses tab is empty and there are no FASTQ files

If there is an ENA project, but the FASTQ files are not available through ENA, the project will be created, but data will not be imported.

Something is missing or the import failed

A variety of other issues and irregularities can cause imports to not succeed or partially succeed, including, but not limited to, a BioProject having multiple associated GSE IDs, incomplete information on the GEO or ENA page, and either the GEO or ENA project not being publicly available.

FAQ

What are GEO and ENA?

The Gene Expression Omnibus (GEO) and the European Nucleotide Archive (ENA) are web-accessible public repositories for genomic data and experiments. Access and learn more about their resources at their respective websites:

GEO - https://www.ncbi.nlm.nih.gov/geo/
ENA - https://www.ebi.ac.uk/ena

How do I know if a GEO project is also in ENA?

You can search ENA using the GEO ID (e.g., GSE71578) to check if there is a matching ENA project (Figure 6).

Additional Assistance

Bulk RNA-Seq

This tutorial gives an overview of RNA-Seq analysis with Partek Flow. It will guide you through creating an RNA-Seq analysis pipeline. The goals of the analysis are to create a list of differentially expressed genes, visualize these gene expression signatures by hierarchical clustering, and interpret the gene lists using gene ontology (GO) enrichment.

This tutorial will illustrate:

Description of the Data Set

This tutorial uses a subset of the data set published in Xu et al. 2013 (PMID: 23902433). In the experiment, mRNA was isolated from HT29 colon cancer cells treated with the drug 5-aza-deoxy-cytidine (5-aza) at three different doses: 0μM (control), 5μM, or 10μM. The mRNA was sequenced using Illumina HiSeq (paired end reads). The goal of the experiment was to identify differentially expressed genes between the different treatment groups.

Additional Assistance

Importing the tutorial data set

The tutorial data set includes 9 samples equally divided into 3 treatment groups. Sequencing was performed by an Illumina HiSeq (paired-end reads), but the workflow can be easily adapted for data generated by other sequencers. Each sample has 2 fastq files for a total of 18 fastq files.

You can obtain the tutorial data set through Partek Flow.

Click your avatar
Click Settings in the drop-down menu (Figure 1)

At the top of the System information page, there is a section labeled Download tutorial data (Figure 2).

Click RNA-Seq 5-AZA to download the tutorial data set

A new project will be created and you will be directed to the Analyses Tab. The data will be downloaded automatically (Figure 3) and imported into your project. Because this is a tutorial project, there is no need to click on Add data as it will be done automatically.

At first the project is empty, but the file download will start automatically in the background. You can wait a few minutes then refresh your browser or you can monitor the download progress using the Queue.

Click Queue
Click View Queued Tasks in the drop-down menu

The Queued tasks page will open (Figure 4).

Click Projects
Click RNA-Seq 5-AZA in the drop-down menu

The Analyses tab will open (Figure 5). If you download has completed, you will see a blue circle titled mRNA.

Once the download completes, the sample table will appear in the Metadata tab.

Click the Metadata tab The Metadata tab includes the sample table with the names of each imported sample (Figure 6).

In the next section of the tutorial, we will add a sample attribute that indicates the treatment group of each sample.

Additional Assistance

Adding sample attributes

Attributes describe samples. Examples of sample attributes include treatment group, age, sex, and time point. Attributes can be added individually in the Metadata tab or in bulk using a text file. In this tutorial, we will add one attribute, 5-AZA Dose, manually.

Click the Metadata tab
Click Manage under Sample attributes (Figure 1)

Click Add new attribute (Figure 2)

To configure a new attribute, at Name, type in 5-AZA Dose as the name of the attribute
Click Add to add 5-AZA Dose as a categorical, project-specific attribute (Figure 3)

Name the first New category 0uM
Click the green plus icon to add category (Figure 4)
Repeat for two additional categories, 5uM and 10uM (Figure 5)

Click Back to metadata tab

The data table now includes an Attribute column for 5-AZA Dose (Figure 6). Next, we need to assign samples attribute categories for 5-AZA Dose.

Select Assign values

The option to edit the 5-AZA Dose field for each sample will appear as a drop-down menu (Figure 7).

Select the 5-AZA Dose text box for a sample to bring up a drop-down menu with the 5-AZA Dose attribute categories (0uM, 5uM, 10uM)
Use the drop-down menus to add a treatment group for each sample

The first three samples (SRR592573-5) should be 0uM, the next three samples (SRR592576-8) should be 5uM, and the final three samples (SRR592579-81) should be 10uM (Figure 8).

Click Apply changes

The data table will now show each a 5-AZA Dose attribute for each sample.

Additional Assistance

Running pre-alignment QA/QC

With attributes added, we can begin building our pipeline.

Click the Analyses tab

In the Analysis tab, data are represented as circles, termed data nodes. One data node, mRNA, should be visible in the Analysis tab (Figure 1).

Click the mRNA node

Clicking a data node brings up the context-sensitive task menu with tasks that can be performed on the data node (Figure 2).

Pre-alignment QA/QC assesses the quality of the unaligned reads and will help us determine whether trimming or filtering is necessary.

Click Pre-alignment QA/QC in the QA/QC section of the task menu
Click Finish to run the task with default settings

Running a task creates a task node, e.g. the blue rectangle labeled Pre-alignment QA/QC (Figure 3), which contains details on the task and a report.. While tasks have been queued or are in progress they have a lighter color. Any output nodes that the task will generate are also displayed in a lighter color until the task completes. Once the task begins running, a progress bar is displayed on the task node.

Click the Pre-alignment QA/QC node

The context-sensitive task menu (Figure 4) shows the option to view the Task report and the Task details. You can also access a task report by double-clicking on a task node.

Click Task report

Pre-aligment QA/QC provides information about the sequencing quality of unaligned reads (Figure 5). Both project level summaries and sample-level summaries are provided.

Click sample SSR592573 in the data table of the report to open its sample-level report

The Average base quality score per position graph in the upper right-hand panel (Figure 6) gives the average Phred score for each position in the reads.

A Phred score is a measure of base call accuracy with a higher score indicating greater accuracy.

By convention, a score above 20 is considered adequate. As you can see, the standard error bars in the graph show that some reads have quality scores below 20 for some of their base pair calls near the 3' end.

Based on the results of Pre-alignment QA/QC, while most of the reads are high quality, we will need to perform read trimming and filtering. For more information about the information included in the task report, please see the Pre-alignment QA/QC user guide.

Click RNA-Seq 5-AZA to return to the Analyses tab

Additional Assistance

Trimming bases and filtering reads

Based on pre-alignment QA/QC, we need to trim low quality bases from the 3' end of reads.

Click the Unaligned reads data node
Click Pre-alignment tools in the task menu
Click Trim bases (Figure 1)

By default, Trim bases removes bases starting at the 3' end and continuing until it finds a base pair call with a Phred score of equal to or greater than 35 (Figure 2).

Click Finish to run Trim bases with default settings

The Trim bases task will generate a new data node, Trimmed reads (Figure 3). We can view the task report for Trim bases by double-clicking either the Trim bases task node or the Trimmed reads data node or choosing Task report from the task menu.

Double-click the Trimmed reads data node to open the task report

The report shows the percentage of trimmed reads and reads removed in a spreadsheet and a two graphs (Figure 4).

The results are fairly consistent across samples with ~2% of reads untrimmed, ~86% trimmed, and ~12% removed for each. The average quality score for each sample is increased with higher average quality scores at the 3' ends.

Click RNA-Seq 5-AZA to return to the Analyses tab

Additional Assistance

Aligning to a reference genome

With our reads trimmed, we now have high-quality reads for each sample. The next step is to align the reads to a reference genome. Alignment matches each of the short sequencing reads to a location in the reference genome.

Click Trimmed reads
Click Aligners in the task menu to display available aligners (Figure 1)

Partek Flow offers a variety of different aligners. Mouse over any option for a short description. For this tutorial, we will use STAR, a fast and accurate aligner commonly used for RNA-Seq data. For more information about STAR and the other aligners, please consult the Aligners user guide.

Click STAR

The STAR aligner options allow us to select the genome build (assembly) and index. For this tutorial, our data set contains only reads that map to chromosome 22 to minimize the time required for resource-intensive tasks, such as alignment.

Click Finish to run with hg19 selected for Assembly and Whole genome for the Aligner index (Figure 2)

Alignment is a resource-intensive task and may take around 20 minutes to complete, even when mapping only reads from a single chromosome. Task and data nodes that have been queued, but not completed, are shown in a ligher color than completed tasks (Figure 3).

The Align reads task generates an Aligned reads data node once complete. You can wait for the alignment task to finish or you can continue building the pipeline while the results of alignment are pending; additional tasks can be added to the pipeline and queued before the current task has completed.

Additional Assistance

Running post-alignment QA/QC

After alignment has completed, we can view the quality of alignment by performing post-alignment QA/QC.

Click the Aligned reads data node
Click QA/QC inthe task menu
Click Post-alignment QA/QC from the QA/QC section of the task menu (Figure 1)

A Post-alignment QA/QC task node will be generated (Figure 2).

Double-click the Post-alignment QA/QC task node to view the task report

Similar to the Pre-alignment QA/QC task report, general quality information about the whole data set is displayed and sample-level reports can be opened by clicking a sample name in the table.

The top two graphs in the data set view (Figure 3) show the alignment breakdown and coverage.

From these graphs, we can see that more than 95% of reads were aligned, but the total number of reads for each sample varies. Normalizing for the variability in total read counts will be addressed in a later section of the tutorial.

Additional Assistance

Quantifying to an annotation model

RNA-Seq uses the number of sequencing reads per gene or transcript to quantify gene expression. Once reads are aligned to a reference genome, we need to assign each read to a known transcript or gene to give a read-count per transcript or gene.

Click the Aligned reads data node
Click Quantification in the task menu

We will use Partek E/M to quantify reads to an annotation model in this tutorial. For more information about the other quantification options, please see the user guide.

Click Quantify to an annotation model (Partek E/M) (Figure 1)

Click Finish (Figure 2)

The Quantify to annotation model task node outputs two data nodes, Gene counts and Transcript counts (Figure 3).

To view the results of quantification, we can select either data node output.

Double-click the Gene counts data node to view the task report

The task report details the number of reads within exons, introns, and intergenic regions. For detailed information about the quantification results, see the Quantify to annotation model (Partek E/M) user guide.

Additional Assistance

Filtering features

Low expression genes may be indistinguishable from noise and will decrease the sensitivity of differential expression analysis.

Click the Gene counts node
Click Filtering in the task menu
Click Filter features (Figure 1)

Click Noise reduction filter
Set the filter to maximum <= 10
Click Finish (Figure 2)

A new Filtered counts node will be created (Figure 3).

Additional Assistance

Normalizing counts

Because different samples have different total numbers of reads, it would be misleading to calculate differential expression by comparing read count numbers for genes across samples without normalizing for the total number of reads.

Click the Filtered counts data node
Click Normalization and scaling in the task menu
Click Normalization (Figure 1)

Additional Assistance

The Count normalization menu will open (Figure 2).

Normalization can be performed by sample or by feature. By sample is selected by default; this is appropriate for the tutorial data set.

Available normalization methods are listed in the left-hand panel. For more information about these options, please see the Normalize counts user guide.

For this tutorial, we will use the recommended default normalization settings.

This adds the Median ratio normalization method, which is suitable for performing differential expression analysis using DESeq2 (Figure 3).

Click Finish to perform normalization

A Normalize counts task node and a Normalized counts data node are added to the pipeline (Figure 4)

Additional Assistance

Exploring the data set with PCA

The principal components analysis (PCA) scatter plot allows us to visualize similarities and differences between the samples in a data set.

Click the Normalized counts data node
Click Exploratory analysis in the task menu
Click PCA
Click Finish to run PCA with the default options

The PCA task node will be added to the pipeline (Figure 1)

Double click the PCA data node to open the PCA scatter plot (Figure 2)

In the Data Viewer, click Style under Configure and set the Color by drop-down to 5-AZA Dose. The scatter plot shows each sample as a sphere, colored by treatment group, in a three dimensional plot. The x, y, and z axes are the first three principal components. The percentage of total variance explained by each is listed next to the axis label. The size of each axis is determined by the variance along that axis. The plot is fully interactive; it can be rotated and points selected.

Here, we can see that samples separate based on treatment, but there is noticeable separation within treatment groups, particularly the 0μM and 10μM treatment groups.

Additional Assistance

Performing differential expression analysis with DESeq2

After normalizing the data, we can perform differential analysis to identify genes that are differentially expressed based on treatment.

Click the Normalized counts node
Click Statistics in the task menu
Click Differential analysis in the task menu (Figure 1)

Check 5-AZA Dose and click Add factors to add the attribute to the statistical model.

Select Next to continue with 5-AZA Dose as the selected attribute

The Comparisons page will open (Figure 4).

It is easiest to think about comparisons as the questions we are asking. In this case, we want to know what are the differentially expressed genes between untreated and treated cells. We can ask this for each dose individually and for both collectively.

The upper box will be the numerator and the lower box will be the denominator in the comparison calculation so we will select the 0μM control in the lower box.

Drag 5μM to the upper box
Drag 0μM to the lower box
Click Add comparison to add 5μM vs. 0μM to the comparison table (Figure 5)

Repeat to create comparisons for 10μM vs. 0μM and 10μM,5μM vs. 0μM (Figure 6)

Click Finish to perform DESeq2 as configured

A DESeq2 task node and a DESeq2 data node will be added to the pipeline (Figure 7).

Additional Assistance

Viewing DESeq2 results and creating a gene list

Once we have performed DESeq2 to identify differentially expressed genes, we can create a list of significantly differentially expressed genes using cutoff thresholds.

Double click the Feature list data node to open the task report

The task report spreadsheet will open showing genes on rows and the results of the DESeq2 on columns (Figure 1).

To get a sense of what filtering thresholds to set, we can view a volcano plot for a comparison.

A volcano plot will open showing p-value on the y-axis and fold-change on the x-axis (Figure 2). If the gene labels are on (not shown), click on the plot to turn them off.

Thresholds for the cutoff lines are set using the Statistics card (Configuration panel > Configure > Statistics). The default thresholds are |2| for the X axis and 0.05 for the Y axis.

Switch to the browser tab showing the DESeq2 report
Click FDR step up
Click the triangle next to FDR step up to open the FDR step up options
Leave All contrasts selected
Set the cutoff value to 0.05. Hit Enter.

This will include genes that have a FDR step up value of less than or equal to 0.05 for all three contrasts, 5μM vs. 0μM, 10μM vs. 0μM and 5μM:10μM vs. 0μM. FDR step up is the false discovery rate adjusted p-value used by convention in microarray and next generation sequencing data sets in place of unadjusted p-value.

Click Fold-change
Click the triangle next to Fold-change to open the Fold-change options
Leave All contrasts selected
Set to From -2 to 2 with Exclude range selected. Hit Enter.

Note that the number of genes that pass the filter is listed at the top of the filter menu next to Results: and will update to reflect any changes to the filter. Here, 27 genes pass the filter (Figure 3). Depending on your settings, the number may be slightly different.

This creates a Filter list task node and a Filtered feature list data node (Figure 4).

Additional Assistance

Viewing a dot plot for a gene

In addition to the volcano plot showing all genes, we can view expression levels of each gene on a dot plot.

Double-click the **Filtered feature list **data node to open the task report
Click the FDR step up header in the 5uM vs. 0uM section to sort by ascending FDR step up

In the task report table, there is a column labeled View with three icons in each row.

Select to open a dot plot for the gene SELENOM

The dot plot for SELENOM (Figure 1) shows each sample as a point with normalized reads on the y-axis. Samples are separated and colored by treatment group.

Additional Assistance

Analyzing Single Cell RNA-Seq Data

Filtering cells
Filter features
Normalization
PCA
Graph-based clustering
t-SNE
Coloring the t-SNE scatter plot
Selecting cells on the t-SNE scatter plot
Filtering cells on the t-SNE scatter plot
Classifying cells
Comparing gene expression between cell types
Generating a heatmap
Performing enrichment analysis
Pipeline

This tutorial presents an outline of the basic series of steps for analyzing a single cell RNA-Seq experiment in Partek Flow starting with the count matrix file.

This tutorial includes only one sample, but the same steps will be followed when analyzing multiple samples. For notes on a few aspects specific to a multi-sample analysis, please see our Single Cell RNA-Seq Analysis (Multiple Samples) tutorial.

If you are new to Partek Flow, please see Getting Started with Your Partek Flow Hosted Trial for information about data transfer and import and Creating and Analyzing a Project for information about the Partek Flow user interface.

Filtering cells

An important step in analyzing single cell RNA-Seq data is to filter out low quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed. You can do this in Partek Flow using the Single cell QA/QC task.

Click on the Single cell data node
Click on the QA/QC section of the task menu
Click on Single cell QA/QC

A task node, Single cell QA/QC, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Single cell QA/QC task node to indicate that the task is running (Figure 1).

Click the Single cell QA/QC node once it finishes running
Double-click the Task report in the task menu

The Single cell QA/QC report includes interactive violin plots showing the value of every cell in the project on several quality measures (Figure 2).

There can be four plots: number of read counts per cell, number of detected genes per cell, the percentage of mitochondrial reads per cell, and the percentage of ribosomal counts.

The plots will be shaded to reflect the selection. Cells that are excluded will be shown as dim dots on all plots.

The read counts per cell and number of detected genes per cell are typically used to filter out potential doublets - if a cell as an unusually high number of total counts or detected genes, it may be a doublet. The mitochondrial reads percentage can be used to identify cells damaged during cell isolation - if a cell has a high percentage of mitochondrial counts, it is likely damaged or dying and may need to be excluded.

Filter features

A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes (features). Because there is no gold standard for what makes a gene informative or not and ideal gene filtering criteria depends on your experimental design and research question, Partek Flow has a wide variety of flexible filtering options. The Filter features step can also be performed before normalization or after normalization.

Click the data node containing count matrix
Click Filtering in the task menu
Click Filter features

There are four categories of filter available - noise reduction, statistics based, feature metadata, and feature list.

The noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The statistics based filter is useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The feature list filter allows you to filter your data set to include or exclude particular genes.

For example, you can use a noise reduction filter to exclude genes that are not expressed by any cell in the data set, but were included in the matrix file.

Click the Noise reduction filter check box
Set the Noise reduction filter to Exclude features where value <= 0 in at least 99.9% of cells using the drop-down menus and text boxes
Click Finish to apply the filter (Figure 3)

This results node, Filtered counts, will be the starting point for the next stage of analysis.

Normalization

Because different cells will have a different number of total counts, it is important to normalize the data prior to downstream analysis. For droplet-based single cell isolation and library preparation methods that use a 3' counting strategy, where only the 3' end of each transcript is captured and sequenced, we recommend the following normalization - 1. CPM (counts per million), 2. Add 1, 3. Log2. This accounts for differences in total UMI counts per cell and log transforms the data, which makes the data easier to visualize.

Click the Filtered cells results node produced by the Filtered counts task
Click Normalization and scaling in the context-sensitive task menu on the right
Click Normalization

This adds CPM (counts per million), Add 1, and Log2 to the Normalization order panel. Normalization steps are performed in descending order.

Click Finish to apply the normalization (Figure 4 )

A new Normalized counts data node will be produced. You can choose to change the color of this node by right-clicking on the task node then clicking Change color and/or rename the result node by right-clicking and selecting Rename data node.

In the example below, I have changed the color to dark blue and renamed the results node based on the scheme.

For more information on normalizing data in Partek Flow, please see the Normalize Counts section of the user manual.

PCA

Principal components (PC) analysis (PCA) is an exploratory technique that is used to describe the structure of high dimensional data by reducing its dimensionality. Because PCA is used to reduce the dimensionality of the data prior to clustering as part of a standard single cell analysis workflow, it is useful to examine the results of PCA for your data set prior to clustering.

Click the Filtered counts node
Click Exploratory analysis in the task menu
Click PCA from the drop-down list

You can choose Features contribute equally to standardize the genes prior to PCA or allow more variable genes to have a larger effect on the PCA by choosing by variance. By default, we take variance into account and focus on the most variable genes.

If you have multiple samples, you can choose to run PCA for each sample individually or for all samples together by selecting or not selecting the Split by sample option (Figure 5).

Click Finish to run

A new PCA task node will be produced.

Double-click the PCA task node to open the 3D PCA scatter plot in data viewer (Figure 6)

Beside PCA coordinates of the cells, PCA task report also includes, the Scree plot, the component loadings table, and the PC projections table.

The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering, UMAP and t-SNE.

Note that Partek Flow suggests appropriate data for each plot type that is chosen so only PCA results will be available to select from for the Scree plot.

Mouse over the Scree plot to identify the point where additional PCs offer little additional information

In this data set, a reasonable cut-off could be set anywhere between 7 and 20 PCs.

Viewing the genes correlated with each PC can be useful when choosing how many PCs to include.

To display PCA projects table, click on the Table drop-down list in the Content icon under Configure and choose PCA projections (Figure 9)

PCA projections table contains each row as an observation (a cell in this case), each column represents one principal component (Figure 10). This table can be downloaded as text file, the same way as the component loading table.

Graph-based clustering

Graph-based clustering identifies groups of similar cells using PC values as the input. By including only the most informative PCs, noise in the data set is excluded, improving the results of clustering.

Click the PCA data node
Click Exploratory analysis in the task menu
Click Graph-based clustering

Clustering can be performed on each sample individually or on all samples together. Here, we are working with a single sample.

Check Compute biomarkers to compute features that are highly expressed when comparing each cluster (Figure 11)
Click Configure to access the Advanced options and change the Number of nearest neighbors to 50 and Nearest Neighbor Type to K-NN for this example tutorial.

The Number of principal components should be set based on the your examination of the Scree plot and component loadings table. The default value of 100 is likely exhaustive for most data sets, but may introduce noise that reduces the number of clusters that can be distinguished.

Click Finish to run the task

A new Graph-based clusters data and Biomarkers data node will be generated along with the task nodes.

Double-click the Graph-based clusters node to see the cluster results and statistics (left screenshot on Figure 12)
Double-click the Biomarkers node to see the computed biomarkers if you have selected this option (right screenshot on Figure 12)

The Graph-based clustering result lists the Total number of clusters and what proportion of cells fall into each cluster as well as Maximum modularity which is a measurement of the quality of the clustering result where optimal modularity is 1. The Biomarkers node includes the top features for each graph-based cluster. It displays the top-10 genes that distinguish each cluster from the others. Download at the bottom right of the table can be used to view and save more features. These are calculated using an ANOVA test comparing the cells in each group to all the other cells, filtering to genes that are 1.5 fold upregulated, and sorting by ascending p-value. This ensures that the top-10 genes of each cluster are highly and disproportionately expressed in that cluster.

We will use t-SNE to visualize the results of Graph-based clustering.

t-SNE

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensional reduction technique that prioritizes local relationships to build a low-dimensional representation of the high-dimensional data that places objects that are similar in high-dimensional space close together in the low-dimensional representation. This makes t-SNE well suited for analyzing high-dimensional data when the goal is to identify groups of similar objects, such as cell types in single cell RNA-Seq data.

Click the Graph-based clusters node
Click Exploratory analysis in the task menu
Click t-SNE

If you have multiple samples, you can choose to run t-SNE for each sample individually or for all samples together using the Split cells by sample option. Please note that this option will not be present if you are running t-SNE on a clustering result. For clarity, clustering results run with all samples together must be viewed together and clustering results run by sample must be viewed by sample.

Like Graph-based clustering, t-SNE takes PC values as its input and further reduces the data down to two or three dimensions. For consistency, you should use the same number of PCs as the input for t-SNE that you used for Graph-based clustering.

Click Apply
Click Finish to run (Figure 13)

A new t-SNE task node will be produced.

Double-click the t-SNE node to open the t-SNE task report (Figure 14). Use the panel on the left to modify the plot or add more plots to this Data viewer session.

The t-SNE scatter plot is interactive and can be viewed for 2D or 3D. The t-SNE plot is 3D by default. You can rotate the 3D plot by left-clicking and dragging your mouse or using Control under Configure. You can zoom in and out using your mouse wheel. You can pan by right-clicking and dragging your mouse. You can use Style to modify color, shape, size, and labeling (e.g. add a fog effect to improve depth perception on the plot). Add a 2D plot clicking New plot, selecting 2D Scatter plot and selecting t-SNE as the source of the data.

Coloring the t-SNE scatter plot

Click on the plot to ensure that the plot window is selected. Click Style under Configure to color the t-SNE.

Color by the options in the drop-down menu under Color. You should be on the normalized counts node which can be seen by hovering over or clicking the circle (node) to the right of the drop-down.
Click the text field in the drop-down and start typing CD79A then select the gene by clicking on it (Figure 15)

The cells on the plot will be colored based on their expression level of CD79A (Figure 16). In the example in Figure 16, the Style icon has been dragged to a different location on the screen and the legend has also been resized and moved. Resizing the legend can either be done on the legend itself or using the Description icon under Configure.

Clicking a cell on the plot shows the expression values of the cell in the legend. Hovering over a cell on the plot also shows this information and related details (Figure 18).

If you want to color by more than three genes at time, such as by a list of genes that distinguish a particular cell type, you can use the color by Feature list option.

Select Feature List from the Color by drop-down
Choose Cytotoxic cells from the List drop-down (use List management in Settings to add lists to Partek Flow which will automatically make them available here)
Choose PCA from the Metric drop-down

Coloring by a list, in this way, calculates the first three principal components for the gene list and colors the cells on the plot by their values along those three PCs with green for PC1, red for PC2, and blue for PC3 (Figure 19).

Typically, the expression of a set of marker genes will be highly correlated, allowing the first PC to account for a large percentage of the variance between cells for that gene list. As a result, the group of cells characterized by their expression of the genes on the list will separate from the rest of the cells along PC1 and will be colored green (Figure 16). If the gene list is more complex, for example, including marker genes for multiple cell types, there may be several sets of correlated genes accounting for significant amounts of variance, leading to groups of cells being distinguishable along PC2 and PC3 as well. In that case, there may be green, blue, and red groups of cells on the plot. If the gene list does not distinguish any group of cells, all cells will have similar PC values, leading to similarly colored cells on the plot.

In addition to coloring by gene expression and by gene lists, the points can be colored by any cell or sample attribute. Available attributes are listed as options in the Color by drop-down menu. Note that any available options are dependent upon the selected data node. In the following section we will use the attribute Graph-based to color our cells by the clusters identified in the Graph-based clustering task (Figure 20).

Selecting cells on the t-SNE scatter plot

Left-click and hold to draw a lasso around a cluster of cells
Release and click the starting circle to close the lasso and select the enclosed cells (Figure 21)

You can also create a lasso with straight lines using Lasso mode by clicking, releasing, and clicking again to draw a shape.

By default, selected cells are shown in bold while unselected cells are dimmed (Figure 22). This can be changed to gray selected cells using the Select & Filter tool in the left panel as shown in Figure 22.

Double-click any blank section of the scatter plot to clear the selection

Alternatively, you can select cells using any criteria available for the data node that is selected in the Select & Filter tool. To change the data selection click the circle (node) and select the data.

Choose Graph-based from the Criteria drop-down menu in the Select & Filter tool after ensuring you on are on the Graph-based cluster node by hovering on the circle (Figure 23). If you are not on the correct node, you need to click the circle and select the data.

This adds check boxes for each level of the attribute (i.e., clusters). Click a check box to select the cells with that attribute level.

Click only 2 and 3

This selects cells from Graph-based clusters 2 and 3 (Figure 24). The number of selected cells is listed in the Legend on the plot.

Cells can also be selected based on their gene expression values in the Select & Filter section.

Click the circle and select the Normalized counts node which has gene expression data
Type cd3d in the text field of the drop-down
Click on CD3D to add it as criteria to select from and use the slider or text field to adjust the selected values. Pin the histogram to visualize the distribution during selection.

Very specific selections can be configured by adding criteria in this way. In the example below, Clusters 2 and 3 and high CD3D expression is selected (Figure 25).

Filtering cells on the t-SNE scatter plot

Once a cell has been selected on the plot, it can be filtered. The filter controls can exclude or include (only) any selected cell. Filtering can be particularly useful when you want to use a gene expression threshold to classify a group of cells, but the gene in question is not exclusively expressed by your cell type of interest.

In this example we can filter to include just cells from the selection we have already made.

The plot will update to show only the included cells as seen in Figure 26.

Cells that are not shown on the plot cannot be selected, allowing you to focus on the visible cells. The number of cells shown on the plot out of the total number of original cells is shown in the Legend. You can adjust the view to focus on only the included cells.

Additional inclusion or exclusion filters can be added to focus on a smaller subset of cells.

Click Clear filters to remove applied filters

The plot will update to show all cells and return to the original scaling.

Classifying cells

Classifying cells allows to you assign cells to groups that can be used in downstream analysis and visualizations. Commonly, this is used to describe cell types, such as B cells and T cells, but can be used to describe any group of cells that you want to consider together in your analysis, such as cycling cells or CD14 high expressing cells. Each cell can only belong to one class at a time so you cannot create overlapping classes.

To classify a cell, just select it then click Classify selection in the Classify tool.

For example, we can classify a cluster of cells expressing high levels of CD79A as B cells.

Set Color by in the Style configuration to the normalized counts node
Type CD79A in the search box and select it. Rotate the 3D plot if you need to see this cluster more clearly.
Draw a lasso around the cluster of CD79A-expressing cells (Figure 28)

Because most of these cells express CD79A, a B cell marker, and because they cluster together on the t-SNE, suggesting they have similar overall gene expression, we believe that all these cells are B cells.

Click Classify under Tools in the left panel
Type B cells for the Name
Click Save (Figure 29)

You can edit the name of a classification or delete it. In this project we use the hosted feature lists for "NK cells", "T cells" and "Monocytes" to classify these cell types by coloring the cells in the t-SNE plot and selecting the cells expressing those genes as shown above. See the list management documentation for more information on how to add these lists. The classifications you have made are saved as a working draft so if you close the plot and return to it, the classifications will still be there and can be visualized on the plot as "New classification". However, classifications are not available for downstream tasks until you apply them. Continue classifying the clusters and save the Data viewer session until you are ready to apply the classification to the data project.

Color by New classifications under Style (Figure 30) while you are still working on the classifications

To use the classifications in downstream tasks and visualizations, you must first apply them.

Click Apply classifications
Name the classification (e.g. Classified Cell Types)
Click Run to confirm

Once you have added a classification to the project, you can color the t-SNE plot by the Classification.

Here, I classified a few additional cell types using a combination of known marker genes and the clustering results then applied the classification (Figure 31).

Summarize Classifications with the number and percentage of cells from each sample that belong to each classification using an Attribute table under New plot. This is particularly useful when you are classifying cells from multiple samples.

Click New plot
Select Attribute table and the source of data (Figure 32) which in this case is called Classify result

Click on the Normalized counts" node
Navigate to the Compute biomarkers task under Statistics in the task menu
Follow the task dialogue and click Finish (Figure 33)
Double click the Biomarkers node to view the Biomarkers results

Comparing gene expression between cell types

A common goal in single cell analysis is to identify genes that distinguish a cell type. To do this, you can use the differential analysis tools in Partek Flow. I will show how to use the ANOVA test in Partek Flow, a statistical test shown to be highly effective for differential analysis of single cell RNA-Seq data.

Click the Normalized counts results node
Click Statistics in the toolbox
Click Differential Analysis
Select ANOVA as the M_ethod to use for differential analysis_

The first page of the configuration dialog asks what attributes you want to include in the statistical test. Here, we only want to consider the Classifications, but in a more complex experiment, you could also include experimental conditions or other sample attributes.

Click Classified Cell Types
Click Next (Figure 34)

We will make a comparison between NK cells and all the other cell types to identify genes that distinguish NK cells. You can also use this tool to identify genes that differ between two cell types or genes that differ in the same cell type between experimental conditions.

Drag NK cells to the top panel

The top panel is the numerator for fold-change calculations so the experimental or test groups should be selected in the top panel.

Click all the other classifications in the bottom panel

The bottom panel is the denominator for fold-change calculations so the control group should be selected in the bottom panel.

Click Add comparison

This adds the comparison to the statistical test.

Click Finish to run the ANOVA task (Figure 35)

Double-click the newly generated data node to open the ANOVA task report

The ANOVA task report lists genes on rows and the results of the statistical test (p-value, fold change, etc.) on columns (Figure 36). For more information, please see our documentation page on the ANOVA task report.

The Feature plot viewer will open showing a dot plot for CCL4 which can be modified to summarize the data in different ways (Figure 37). In the image below, the red boxes highlight the changes that were made to configure the plot. This includes overlaying the violins (density plots with the width corresponding to frequency) on the dot plot represented by the Classified Cell Types.

You can switch the grouping of cells. To do this, show the X axis labels then click and drag the labels to reposition the cell types on the plot.

Click ANOVA report to return to the table

The table lists all of genes in the data set; using the filter control panel on the left, we can filter to just the genes that are significantly different for the comparison.

Click FDR step up and click the arrow next to it
Set to 1e-8

Here, we are using a very stringent cutoff to focus only on genes that are specific to NK cells, but other applications may require a less stringent cutoff.

Click Fold change and click the arrow next to it
Set to -2 to 2

The number of genes at the top of the filter control panel updates to indicate how many genes are left after the filters are applied.

The ANOVA report will close and a new task, the Differential analysis filter, will run and generate a filtered Feature list data node.

For more information about the ANOVA task, please see the Differential Gene Expression - ANOVA section of our user manual.

Generating a heatmap

Once we have filtered to a list of significantly different genes, we can visualize these genes by generating a heatmap.

Click the Filtered feature list data node produced by the Differential analysis filter
Click Exploratory analysis in the toolbox
Click Hierarchical clustering / heatmap

The hierarchical clustering task will generate the heatmap; choose Heatmap as the plot type. You can choose to Cluster features (genes) and cells (samples) under Feature order and Cell order in the Ordering section. You will almost always want to cluster features as this generates the clear blocks of color that make heatmaps comprehensible. For single cell data sets, you may choose to forgo clustering the cells in favor of ordering them by the attribute of interest. Here, we will not filter the cells, but instead order them by their classification.

Click Assign order under Cell order

You can filter samples using the Filtering section of the configuration dialog. Here, we will not filter out any samples or cells.

Choose Classification from the Ordering drop-down menu
Drag NK cells to the top of the Sample order
Click Finish to run (Figure 38)

Double-click the Hierarchical cluster task node to open the task report

It may initially be hard to distinguish striking differences in the heatmap. This is common in single cell RNA-Seq data because outlier cells will skew the high and low ends. We can adjust the minimum and maximum of the color scheme to improve the appearance of the heatmap.

Click Heatmap
Toggle on the Range Min and set to -2
Toggle on the Range Max and set to 2

Distinct blocks of red and blue are now more pronounced on the plot. Cells are on rows and genes are on columns. Because of the limited number of pixels on the screen, genes are grouped. You can zoom in using the zoom controls or your mouse wheel if you want to view individual gene rows. We can annotate the plot with cell attributes.

Choose Classified Cell Types from the Annotations drop-down menu
Change the Annotation font size under Style in the Annotations section

The plot now includes blocks of color along the left edge indicating the classification of the cells. We can transpose the plot to give the cell labels a bit more space.

Toggle off the Row labels under Axes to remove the sample labels

Performing enrichment analysis

While a long list of significantly different genes is important information about a cell type, it can be difficult to identify what the biological consequences of these changes might be just by looking at the genes one at a time. Using enrichment analysis, you can identify gene sets and pathways that are over-represented in a list of significant genes, providing clues to the biological meaning of your results.

Click the Feature list data node produced by the Differential analysis filter
Click Biological interpretation
Click Gene set enrichment

We distribute the gene sets from the Gene Ontology Consortium, but Gene set enrichment can work with any custom or public gene set database.

Choose the latest assembly available from the Gene set drop-down
Click Finish
Double-click the Gene set enrichment task node to open the task report

The Gene set enrichment task report lists gene sets on rows with an enrichment score and p-value for each. It also lists how many genes in the gene set were in the input gene list and how many were not (Figure 40). Clicking the Gene set ID links to the geneontology.org page for the gene set.

In Partek Flow, you can also check for enrichment of KEGG pathways using the Pathway enrichment task. The task is quite similar to the Gene set enrichment task, but uses KEGG pathways as the gene sets.

The task report is similar to the Gene set enrichment task report with enrichment scores, p-values, and the number of genes in and not in the list (Figure 41).

Clicking the KEGG pathway ID in the Pathway enrichment task report opens a KEGG pathway map (Figure 42). The KEGG pathway maps have fold-change and p-value information from the input gene list overlaid on the map, adding a layer of additional information about whether the pathway was upregulated or downregulated in the comparison.

Color are customizable using the control panel on the left and the plot is interactive. Mousing over gene boxes gives the genes accounted for by the box, with genes present in the input list shown in bold, and the coloring gene shown in red (Figure 43).

Clicking a pathway box opens the map of that pathway, providing an easy way to explore related gene networks.