1 of 13

Installation Guide

Partek Flow is a web-based application for genomic data analysis and visualization. It can be installed on a desktop computer, computer cluster or cloud. Users can then access Partek Flow from any browser-enabled device, such as a personal computer, tablet or smartphone.

Read on to learn about the following installation topics:

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Minimum System Requirements

Web Browser Requirements

Regardless of whether Partek Flow is installed on a server or the cloud, users will be interacting with the software using a web browser. We support the latest Google Chrome, Mozilla Firefox, Microsoft Edge and Apple Safari browsers. While we make an effort to ensure that Partek Flow is robust, note that some browser plugins may affect the way the software is viewed on your browser.

Hardware Requirements (Single-node Linux)

If you are installing Partek Flow on your own single-node server, we require the following for successful installation:

Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8 or later versions of these distributions
64-bit 2GHz quad-core processor1
48GB of RAM2
> 2TB of storage available for data
> 100GB on the root partition
A broadband internet connection

We support Docker-based installations. Please contact support@partek.com for more information.

1Note that some analyses have higher system requirements for example to run the STAR aligner on a reference genome of size ~3 GB (such as human, mouse or rat), 16 cores are required.

2Input sample file size can also impact memory usage, which is particularly the case for TopHat alignments.

Increasing hardware resources (cores, RAM, disk space, and speed) will allow for faster processing of more samples.

If you are licensed for the Single Cell Toolkit, please see Single Cell Toolkit System Requirements for amended hardware requirements.

Hardware Requirements (Cluster or Cloud)

Storage Recommendations

Proper storage planning is necessary to avoid future frustration and costly maintenance. Here are several DO's and DO NOT's:

DO:

Plan for at least 3 to 5 times more storage than you think is necessary. Investing extra funds in storage and storage performance is always worth it.
Keep all Flow data on a single partition that is expandable, such as RAID or LVM.
Back up your data, especially the Partek Flow database.

DO NOT:

Store data on 'removable' USB drives. Partek Flow will not be able to see these drives.
Store data across multiple partitions or folder locations. This will increase the maintenance burden substantially.
Use non-Linux file systems like NTFS.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Single Cell Toolkit System Requirements

Because of the large size of single cell RNA-Seq data sets and the computationally-intensive tools used in single cell analysis, we have amended our system requirements and recommendations for installations of Partek Flow with the Single Cell toolkit.

Up to 100,000 cells per analysis

Required

Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 64 GB of RAM
Local scratch space*: 1 TB with cached or native speeds of 2GB/s or higher
Storage: > 2 TB available for data and > 100 GB on the root partition

More than 100,000 cells per analysis

Required

Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 256 GB of RAM
Local scratch space1: 2 TB with cached or native speeds of 2GB/s or higher
Storage: > 4 TB available for data

Additional Assistance

Single Node Installation

This document describes how to set up and configure a single-node Partek Flow license.

Installing on Linux

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Single Node Amazon Web Services Deployment

Creating a New Elastic Compute Cloud Instance for Partek Flow Software
Enabling External Access to the Partek Flow Elastic Compute Cloud Instance)
Attaching the Amazon Elastic Block Store Volume for Partek Flow Data Storage)
Installing Partek Flow on a New Elastic Compute Cloud Instance
Partek Amazon Web Services Support
General Recommendations
Amazon Web Services Instance Type Resources and Costs
Elastic Block Store Volumes

Creating a New Elastic Compute Cloud Instance for Partek Flow Software

Note: This guide assumes all items necessary for the Amazon elastic Comput Clout (EC2) instance does not exist, such as Amazon Virtual Private Cloud (VPC), subnets, and security groups, thus their creation is covered as well.

Click on EC2

Switch to the region intended to deploy Partek Flow software. This tutorial uses US East (N. Virginia) as an example.

On the left menu, click on Instances, then click the Launch Instance button. The Choose an Amazon Machine Image (AMI) page will appear.

Click the Select button next to Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-f4cc1de2. NOTE: Please use the latest Ubuntu AMI. It is likely that the AMI listed here will be out of date.

Choose an Instance Type, the selection depends on your budget and the size of the Partek Flow deployment. We recommend m4.large for testing or cluster front-end operation, m4.xlarge for standard deployments, and m4.2xlarge for alignment-heavy workloads with a large user-base. See the section AWS instance type resources and costs for assistance with choosing the right instance. In most cases, the instance type and associated resources can be changed after deployment, so one is not locked into the choices made for this step.

NOTE: New instance types will become available. Please use the latest mX instance type provided as it will likely perform better and be more cost effective than older instance types.

On the Configure Instance Details page, make the following selections:

Set the number of instances to 1. An autoscaling group is not necessary for single-node deployments
Purchasing Option: Leave Request Spot Instances unchecked. This is relevant for cost-minimization of Partek Flow cluster deployments.
Network: If you do not have a virtual private cloud (VPC) already created for Partek Flow, click Create New VPC. This will open a new browser tab for VPC management.
- Use the following settings for the VPC:
  - Name Tag: Flow-VPC
  - IPv4 CIDR block: 10.0.0.0/16
  - Select No IPv6 CIDR Block
  - Tenancy: Default
- Click Yes, Create. You may be asked to select a DHCP Option set. If so, then make sure the dynamic host configuration protocol (DHCP) option set has the following properties:
  - Options: domain-name = ec2.internal;domain-name-servers = AmazonProvidedDNS;
  - DNS Resolution: leave the defaults set to yes
  - DNS Hostname: change this to yes as internal DNS resolution may be necessary depending on the Partek Flow deployment
- Once created, the new Flow-VPC will appear in the list of available VPCs. The VPC needs additional configuration for external access. To continue, right click on Flow-VPC and select Edit DNS Resolution, select Yes, and then Save. Next, right click the Flow-VPC and select Edit DNS Hostnames, select Yes, then Save.
- Make sure the DHCP option set is set to the one created above. If it is not, right-click on the row containing Flow-VPC and select Edit DHCP Option Sets.
- Close the VPC Management tab and go back to the EC2 Management Console.
Click the refresh arrow next to Create New VPC and select Flow-VPC.
Click Create New Subnet and a new browser tab will open with a list of existing subnets. Click Create Subnet and set the following options:
- Name Tag: Flow-Subnet
- VPC: Flow-VPC
- VPC CIDRs: This should be automatically populated with the information from Flow-VPC
- Availability Zone: It is OK to let Amazon choose for you if you do not have a preference
- IPv4 CIDR block: 10.0.1.0/24
Stay on the VPC Dashboard Tab and on the left navigation menu, click Internet Gateways, then click Create Internet Gateway and use the following options:
- Name Tag: Flow-IGW
- Click Yes, Create
The new gateway will be displayed as Detached. Right click on the Flow-IGW gateway and select Attach to VPC, then select Flow-VPC and click Yes, Attach.
Click on Route Tables on the left navigation menu.
If it exists, select the route table already associated with Flow-VPC. If not, make a new route table and associate it with Flow-VPC. Click on the new route table, then click the Routes tab toward the bottom of the page. The route Destination = 10.0.0.0/16 Target = local should already be present. Click Edit, then Click Add another route and set the following parameters:
- Destination: 0.0.0.0/0
- Target set to Flow-IGW (the internet gateway that was just created)
Click Save
Close the VPC Dashboard browser tab and go back to the EC2 Management Console tab. Note that you should still be on Step 3: Configure Instance Details.

Click the refresh arrow next to Create New Subnet and select Flow-Subnet.

Auto-assign Public IP: Use subnet setting (Disable)

Placement Group: No placement group

IAM role: None.

Note: For multi-node Partek Flow deployments or instances where you would like Partek to manage AWS resources on your behalf, please see Partek AWS support and set up an IAM role for your Partek Flow EC2 instance. In most cases a specialized IAM role is unnecessary and we only need instance ssh keys.

Shutdown Behaviour: Stop

Enable Termination Protection: select Protect against accidental termination

Monitoring: leave Enable CloudWatch Detailed Monitoring disabled

EBS-optimized Instance: Make sure Launch as EBS-optimized Instance is enabled. Given the recommended choice of an m4 instance type, EBS optimization should be enabled at no extra cost.

Tenancy: Shared - Run a shared hardware instance

Network Interfaces: leave as-is

Advanced Details: leave as-is

Click Next: Add Storage. You should be on Step 4: Add Storage

For the existing root volume, set the following options:

Size: 8 GB
Volume Type: Magnetic
Select Delete on Termination
- Note: All Partek Flow data is stored on a non-root EBS volume. Since only the OS is on the root volume and not frequently re-booted, a fast root volume is probably not necessary or worth the cost. For more information about EBS volumes and their performance, see the section EBS volumes.

Click Add New Volume and set the following options:

Volume Type: EBS
Device: /dev/sdb (take the default)
Do not define a snapshot
Size (GiB): 500
- Note: This is the minimum for ST1 volumes, see: EBS volumes
Volume Type: Throughput optimized HDD (ST1)
Do not delete on terminate or encrypt

Click Next: Add Tags

You do not need to define any tags for this new EC2 instance, but you can if you would like.

Click Next: Configure Security Group

For Assign a Security Group select Create a New Security Group
Security Group Name: Flow-SG
Description: Security group for Partek Flow server
Add the following rules:
- SSH set Source to My IP (or the address range of your company or institution)
- Click Add Rule:
- Set Type to Custom TCP Rule
- Set Port Range to 8080
- Set Source to anywhere (0.0.0.0/0, ::/0)
  - Note: It is recommended to restrict Source to just those that need access to Partek Flow.

Click Review and Launch

The AWS console will suggest this server not be booted from a magnetic volume. Since there is not a lot of IO on the root partition and reboots are will be rare, choosing Continue with Magnetic will reduce costs. Choosing an SSD volume will not provide substantial benefit but it OK if one wishes to use an SSD volume. See the EBS Volumes section for more information.

Click Launch

Create a new keypair:

Name the keypair Flow-Key
Download this keypair, the run chmod 600 Flow-Key.pem (the downloaded key) so it can be used.
Backup this key as one may lose access to the Partek Flow instance without it.

The new instance will now boot. Use the left navigation bar and click on Instances. Click the pencil icon and assign the instance the name Partek Flow Server

Enabling External Access to the Partek Flow Elastic Compute Cloud Instance

The server should be assigned a fixed IP address. To do this, click on Elastic IPs on the left navigation menu from the EC2 Management Console.

Click Allocate New Address
Assign Scope to VPC
Click Allocate

On the table containing the newly allocated elastic IP, right click and select Associate Address

For Instance, select the instance name Flow Test Server
For Private IP, select the one private IP available for the Partek Flow EC2 instance, then click Associate

Note: For the remaining steps, we refer to the elastic ip as elastic.ip

SSH to the new Flow-Server instance:

$ chmod 600 Flow-Key.pem

$ ssh -i Flow-Testing.pem ubuntu@elastic.ip

Attaching the Amazon Elastic Block Store Volume for Partek Flow Data Storage

Attach, format, and move the ubuntu home directory onto the large ST1 elastic block store (EBS) volume. All Partek Flow data will live in this volume. Consult the AWS EC2 documentation for further information about attaching EBS volumes to your instance.

$ sudo su

$ mkfs -t ext4 /dev/xvdb

Note: Under Volumes in the EC2 management console, inspect Attachment Information. It will likely list the large ST1 EBS volume as attached to /dev/sdb. Replace "s" with "xv" to find the device name to use for this mkfs command.

Make a note of the newly created UUID for this volume

Copy the ubuntu home directory onto the EBS volume using a temporary mount point:

$ mount -t ext4 /dev/xvdb /mnt/

$ rsync -avr /home/ /mnt/

$ umount /mnt/

Make the EBS volume mount at system boot:

Add the following to /etc/fstab: UUID=the-UUID-from-the-mkfs-command-above /home ext4 defaults,nofail 0 2

$ mount -a

Disconnect the ssh session, then log in again to make sure all is well

Installing Partek Flow on a New Elastic Compute Cloud Instance

Note: For additional information about Partek Flow installations, see our generic Installation Guide

Before beginning, send the media access control (MAC) address of the EC2 instance to MAC address of the EC2 instance to licensing@partek.com. The output of ifconfig will suffice. Given this information, Partek employees will create a license for your AWS server. MAC addresses will remain the same after stopping and starting the Partek Flow EC2 instance. If the MAC address does change, let our licensing department know and we can add your license to our floating license server or suggest other workarounds.

Install required packages for Partek Flow:

$ sudo apt-get update

$ sudo apt-get install software-properties-common

$ sudo add-apt-repository -y ppa:openjdk-r/ppa

$ sudo apt-get install openjdk-8-jdk python python-pip python-dev zlib1g-dev python-matplotlib r-base python-htseq libxml2-dev perl make gcc g++ zlib1g libbz2-1.0 libstdc++6 libgcc1 libncurses5 libsqlite3-0 libfreetype6 libpng12-0 zip unzip libgomp1 libxrender1 libxtst6 libxi6 debconf

$ sudo pip install --upgrade pip && pip install --upgrade --upgrade-strategy eager --force-reinstall virtualenv numpy pysam cnvkit

Install Partek Flow:

Note: Make sure you are running as the ubuntu user.

$ cd (we will install Partek Flow to ubuntu's home directory)

$ wget --content-disposition packages.partek.com/linux/flow-release

$ unzip PartekFlow*.zip

$ ./partek_flow/start_flow.sh

Partek Flow has finished loading when you see INFO: Server startup in xxxxxxx ms in the partek_flow/logs/catalina.out log file. This takes ~30 seconds.

Alternative: Install Flow with Docker. Our base packages are located here: https://hub.docker.com/r/partekinc/flow/tags

Open Partek Flow with a web browser: http://elastic.ip:8080/

Enter license key

Set up the Partek Flow admin account

Leave the library file directory at its default location and check that the free space listed for this directory is consistent with what was allocated for the ST1 EBS volume.

Done! Partek Flow is ready to use.

Partek Amazon Web Services Support

After the EC2 instance is provisioned, we are happy to assist with setting up Partek Flow or address other issues you encounter with the usage of Partek Flow. The quickest way to receive help is to allow us remote access to your server by sending us Flow-Key.pem and amending the SSH rule for Flow-SG to include access from IP 97.84.41.194 (Partek HQ). We recommend sending us the Flow-Key.pem via secure means. The easiest way to do this is with the following command:

$ curl -F "file=@FlowKey.pem" https://installfeedback.partek.com/fupload

We also provide live assistance via GoTo meeting or TeamViewer if you are uncomfortable with us accessing your EC2 instance directly. Before contacting us, please run $ ./partek_flow/flowstatus.sh to send us logs and other information that will assist with your support request.

General Recommendations

With newer EC2 instance types, it is possible to change the instance type of an already deployed Partek Flow EC2 server. We recommend doing several rounds of benchmarks with production-sized workloads and evaluate if the resources allocated to your Partek Flow server are sufficient. You may find that reducing resources allocated to the Partek Flow server may come with significant cost savings, but can cause UI responsiveness and job run-times to reach unacceptable levels. Once you have found an instance type that works, you may wish to use reserved instance pricing which is significantly cheaper than on-demand instance pricing. Reserved instances come with 1 or 3-year usage terms. Please see the EC2 Reserved Instance Marketplace to sell or purchase existing reserved instances at reduced rates.

The network performance of the EC2 instance type becomes an important factor if the primary usage of Partek Flow is for alignment. For this use case, one will have to move copious amounts of data back (input fastq files) and forth (output bam files) between the Partek Flow server and the end users, thus it is important to have as what AWS refers to as high network performance which for most cases is around 1 Gb/s. If the focus is primarily on downstream analysis and visualization (e.g. the primary input files are ADAT) then network performance is less of a concern.

We recommend HVM virtualization as we have not seen any performance impact from using them and non-HVM instance types can come with significant deployment barriers.

Make sure your instance is EBS optimized by default and you are not charged a surcharge for EBS optimization.

T-class servers, although cheap, may slow responsiveness for the Partek Flow server and generally do not provide sufficient resources.

We do not recommend placing any data on instance store volumes since all data is lost on those volumes after an instance stops. This is too risky as there are cases where user tasks can take up unexpected amounts of memory forcing a server stop/reboot.

Amazon Web Services Instance Type Resources and Costs

The values below were updated April 2017. The latest pricing and EC2 resource offerings can be found at http://www.ec2instances.info

Single server recommendation: m4.xlarge or m4.2xlarge

Network performance values for US-EAST-1 correspond to: Low ~ 50Mb/s, Medium ~ 300Mb/s, High ~ 1Gb/s.

Elastic Block Store Volumes

Choice of a volume type and size:

This is dependent on the type of workload. For must users, the Partek Flow server tasks are alignment-heavy so we recommend a throughput optimized HDD (ST1) EBS volume since most aligner operations are sequential in nature. For workloads that focus primarily on downstream analysis, a general purpose SSD volume will suffice but the costs are greater. For those who focus on alignment or host several users, the storage requirements can be high. ST1 EBS volumes have the following characteristics:

Max throughput 500 MiB/s

$0.045 per GB-month of provisioned storage ($22.5 per month for a 500 GB of storage).

Note that EBS volumes can be grown or performance characteristics changed. To minimize costs, start with a smaller EBS volume allocation of 0.5 - 2 TB as most mature Partek Flow installations generate roughly this amount of data. When necessary, the EBS volume and the underlying file system can be grown on-line (making ext4 a good choice). Shrinking is also possible but may require the Partek Flow server to be offline.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Multi-Node Cluster Installation

Partek Flow is a genomics data analysis and visualization software product designed to run on compute clusters. The following instructions assume the most basic setup of Partek Flow and must only be attempted by system administrators who are familiar with Linux-based commands. These instructions are not intended to be comprehensive. Cluster environments are largely variable, thus there are no 'one size fits all' instructions. The installation procedure on a computer cluster is highly dependent on the type of computer cluster and the environment it is located. We can to support a large array of Linux distributions and configurations. In all cases, Partek Technical Support will be available to assist with cluster installation and maintenance to ensure compatibility with any cluster environment. Please consult with Partek Licensing Support (licensing@partek.com) for additional information.

Prior to installation, make sure you have the license key related to the host-ID of the compute cluster the software will be installed in. Contact licensing@partek.com for key generation.

Installation on a Computer Cluster
Integration with your queueing system
Bringing up workers
Shutting down workers
Updating Partek Flow

Installation on a Computer Cluster

Make a standard linux user account that will run the Partek Flow server and all associated processes. It is assumed this account is synced between the cluster head node and all compute nodes. For this guide, we name the account flow

Log into the flow account and proceed to the cd to the flow home directory

cd home/flow

Download Partek Flow and the remote worker package

wget --content-disposition http://packages.partek.com/linux/flow

wget --content-disposition http://packages.partek.com/linux/flow-worker

Unzip these files into the flow home directory /home/flow. This yields two directories: partek_flow and P_artekFlowRemoteWorker_
Partek Flow can generate large amounts of data, so it needs to be configured to the bulk of this data in the largest shared data store available. For this guide we assume that the directory is located at /shared. Adjust this path accordingly.
It is required that the Partek Flow server (which is running on the head node) and remote workers (which is running on the compute nodes) see identical file system paths for any directory Partek Flow has read or write access to. Thus /shared and /home/flow must be mounted on the Flow server and all compute nodes. Create the directory /shared/FlowData and allow the flow linux account write access to it
It is assumed the head node is attached to at least two separate networks: (1) a public network that allows users to log in to the head node and (2) a private backend network that is used for communication between compute nodes and the head node. Clients connect to the Flow web server on port 8080 so adjust the firewall to allow inbound connections to 8080 over the public network of the head node. Partek Flow will connect to remote workers over your private network on port 2552 and 8443, so make sure those ports are open to the private network on the flow server and workers.
Partek Flow needs to be informed of what private network to use for communication between the server and workers. It is possible that there are several private networks available (gigabit, infiniband, etc.) so select one to use. We recommend using the fastest network available. For this guide, let's assume that private network is 10.1.0.0/16. Locate the headnode hostname that resolves to an address on the 10.1.0.0/16 network. This must resolve to the same address on all compute nodes.
For example:

host head-node.local yields 10.1.1.200

Open /home/flow/.bashrc and add this as the last line:

export CATALINA_OPTS="$CATALINA_OPTS -Djava.awt.headless=true
-DflowDispatcher.flow.command.hostname=head-node.local
-DflowDispatcher.akka.remote.netty.tcp.hostname=head-node.local"

Source .bashrc so the environment variable CATALINA_OPTS is accessible.

NOTE: If workers are unable to connect (below), then replace all hostnames with their respective IPs.

Start Partek Flow

~/partek_flow/start_flow.sh

You can monitor progress by tailing the log file partek_flow/logs/catalina.out. After a few minutes, the server should be up.
Make sure the correct ports are bound

netstat -tulpn

You should see 10.1.1.200:2552 and :::8080 as LISTENing. Inspect catalina.out for additional error messages.
Open a browser and go to http://localhost:8080 on the head node to configure the Partek Flow server.
Enter the license key provided (Figure 1)

If there appears to be an issue with the license or there is a message about 'no workers attached', then restart Partek Flow. It may take 30 sec for the process to shut down. Make sure the process is terminated before starting the server back up:

~/partek_flow/stop_flow.sh

Then run:

~/partek_flow/start_flow.sh

You will now be prompted to setup the Partek Flow admin user (Figure 2). Specify the username (admin), password and email address for the administrator account and click Next

Select a directory folder to store the library files that will be downloaded or generated by Partek Flow (Figure 3). All Partek Flow users share library files and the size of the library folder can grow significantly. We recommend at least 100GB of free space should be allocated for library files. The free space in the selected library file directory is shown. Click Next to proceed. You can change this directory after installation by changing system preferences. For more information, see Library file management.

To set up the Partek Flow data paths, click on Settings located on the top-right of the Flow server webpage. On the left, click on Directory permissions then Permit access to a new directory. Add /shared/PartekFlow and allow all users access.
Next click on System preferences on the left menu and change data download directory and default project output directory to /shared/PartekFlow/downloads and /shared/PartekFlow/project_output respectively

Note: If you do not see the /sharedfolder listed, click on the Refresh folder list link that is toward the bottom of the download directory dialog

Since you do not want to run any work on the head node, go to Settings>System preferences>Task queue and job processing and uncheck Start internal worker at Partek Flow server startup.
Restart the Flow server:

~/partek_flow/stop_flow.sh

After 30 seconds, run:

~/partek_flow/start_flow.sh

This is needed to disable the internal worker.

Test that remote workers can connect to the Flow server
Log in as the flow user to one of your compute nodes. Assume the hostname is compute-0. Since your home directory is exported to all compute nodes, you should be able to go to /home/flow/PartekFlowRemoteWorker/
To start the remote worker:

./partekFlowRemoteWorker.sh head-node.local compute-0

These two addresses should both be in the 10.1.0.0/16 address space. The remote worker will output to stdout when you run it. Scan for any errors. You should see the message woot! I'm online.
A successfully connected worker will show up on the Resource management page on the Partek Flow server. This can be reached from the main homepage or by clicking Resource management from the Settings page. Once you have confirmed the worker can connect, kill the remote worker (CTRL-C) from the terminal in which you started it.
Once everything is working, return to library file management and add the genomes/indices required by your research team. If Partek hosts these genomes/indices, these will automatically be downloaded by Partek Flow

Integration with your queueing system

In effect, all you are doing is submitting the following command as a batch job to bring up remote workers:

/home/flow/PartekFlowRemoteWorker/partekFlowRemoteWorker.sh head-node.local compute-0

The second parameter for this script can be obtained automatically via:

$(hostname -s)

Bringing up workers

Bring up workers by running the command below. You only need to run one worker per node:

/home/flow/PartekFlowRemoteWorker/partekFlowRemoteWorker.sh head-node.local compute-0

Shutting down workers

Go to the Resource management page and click on the Stop button (red square) next to the worker you wish to shut down. The worker will shut down gracefully, as in it will wait for currently running work on that node to finish, then it will shut down.

Updating Partek Flow

For the cluster update, you will get a link of .zip file for Partek Flow and remote Flow worker respectively from Partek support, all of the following actions should be performed as the Linux user that runs Flow. Do NOT run Flow as root.

Go to the Flow installation directory. This is usually the home directory of the Linux user that runs Flow and it should contain a directory named "partek_flow". The location of the Flow install can also be obtained by running ps aux | grep flow and examining the path of the running Flow executable.
Shut down Flow:

./partek_flow/stop_flow.sh

Download the new version of Flow and the Flow worker:

wget --content-disposition http://packages.partek.com/linux/flow-release

wget --content-disposition http://packages.partek.com/linux/flow-worker-release

Make sure Flow has exited:

ps aux | grep flow

The flow process should no longer be listed.

Unpack the new version of Flow install and backup the old install:

mv partek_flow partek_flow_prev

mv PartekFlowRemoteWorker PartekFlowRemoteWorker_prev

Backup the Flow database folder. This should be located in the home directory of the user that runs Flow.

tar -czvf partek-db-bkp-date.tgz ~/.partekflow

Start the updated version of Flow:

./partek_flow/start_flow.sh

tail -f partek_flow/logs/catalina.out

(make sure there is nothing of concern in this file when starting up Flow. You can stop the file tailing by typing: CTRL-C)

You may also want to examine the the main Flow log for errors:

~/.partekflow/logs/flow.log

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Creating Restricted User Folders within the Partek Flow server

Partek Flow provides the infrastructure to isolate data from different users within the same server. This guide will provide general instructions on how to create this environment within Partek Flow. This can be modified to accommodate existing file systems already accessible to the server.

Go to Settings > Directory permissions and restrict parent folder access (typically /home/flow) to Administrator accounts only

Click the Permit access to a new directory button and navigate to the folder with your library files (typically /home/flow/FlowData/library_files). Select the All users (automatically updated) checkbox to permit all users (including those that will be added in the future) to see the library files associated with the Partek Flow server

Then go to System preferences > Filesystem and storage and set the Default project output directory to "Sample file directory"

Create your first user and select the Private directory checkbox. Specify where the private directory for that user is located

If needed, you can create a user directory by clicking Browse > Create new folder

This automatically sets browsing permissions for that private directory to that user

When a user creates a project. The default project output directory is now within their own restricted folder

More importantly, other users cannot see them

Add additional users as needed

Additional Assistance

Updating Partek Flow

Before performing updates, we recommend Backing Up the Database.

Updates are applied using the Linux package manager.

Make sure Partek Flow is stopped before updating it.

To update Partek Flow, open a terminal window and enter the following command.

For Debian/Unbuntu, enter:

For Redhat/Fedora/CentOS, enter:

For the YUM package manager, if updating Partek Flow fails with a message claiming "package not signed," enter:

Note that our packages are signed and the message above is erroneous.

For tomcat build update, download the latest version from below:

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Uninstalling Partek Flow

Linux

Open a terminal window and enter the following command.

Debian/Ubuntu:

RedHat/Fedora/CentOS:

The uninstall removes binaries only (/opt/partek_flow). The logs, database (partek_db) and files in the home/flow/.partekflow folder will remain unaffected.

MacOS

Stop and quit Partek Flow using the Partek Flow app in the menu.
Using Finder, delete Flow application from the Applications menu.

Missing image Figure 1. Control of Partek Flow through the menu bar

This process does not delete data or the library files. Users who wish to delete those can delete them using Finder or terminal. The default location of project output files and library files is the /FlowData directory under the user's home folder. However, the actual location may vary depending on your System or Project specific settings.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Dependencies

Flow ships with tasks that do not have all of their dependencies included. On startup Flow will attempt to install the dependencies, but not every system is equipped to install them.

In the case of any difficulties, it is highly recommended to instead use a docker deployment (cluster installations may require singularity instead, which is somewhat still a work-in-progress)Z

CNVkit

Requires Python 2.7 or later.

On startup Flow will attempt to install additional python packages using the command

Requires R 3.2.3 or later.

On startup Flow will attempt to install additional R packages.

There are cascading dependencies, but you can view the core libraries in partek_flow/bin/cnvkit-0.8.5/install.R

If these packages can't be built locally, it may be possible for the user to download them from us (see below).

DESeq2

Requires R 3.0 or later.

On startup Flow will attempt to install additional R packages.

There are cascading dependencies, but you can view the core libraries in partek_flow/bin/deseq_two-3.5/install.R

If these packages can't be built locally, it may be possible for the user to download them from us (see below).

RcppArmadillo may also have dependencies on multi-threading shared objects that may not be on the LD_LIBRARY_PATH

The recommendation is to copy those .so files to a folder and make sure it is available from the LD_LIBRARY_PATH when the server/worker starts.

Additional dynamic libraries (such as libxml2.so) may be missing and we can provide a copy appropriate for the target OS.

HTSeq

Requires Python 2.7 or 3.4 or above

On startup Flow attempts to install using pip

MACS3

Requires python 3.0 or above

Python

If there are any conflicts with preinstalled python packages, Flow should be configured to run with its own virtual environment:

R

R can usually be installed from the package manager. If the user installs Flow via apt or yum it should already be installed.

Currently, we offer a set of R packages compatible with some versions of R

Extract this file in the home directory. (Make .R a symlink if the home directory doesn't have enough free space)

These packages include the dependencies for both CNVkit and DESeq2

When running R diagnostic commands outside flow, it can simplify things if the environment includes a reference to the ~/.R folder:

or load

in ~/.Rprofile

list loaded packages:

get the version:

Variant Effect Predictor

This is a compiled Perl script (so it has no direct dependency on Perl itself) we have had one report (istem.fr) of it failing to run.

DECoN

DECoN requires R version 3.1.2

It must be installed under /opt/R-3.1.2 or set the DECON_R environment variable to its folder

Download DECoN

and install it under /opt/DECoN or set the DECON_PATH environment variable to its folder

You may need to add

to Linux/packrat/packrat.opts

Additional Assistance

Docker and Docker-compose

Docker

Docker can be used along Partek Flow to deploy an easy to maintain environment which will not have dependency issues and will be easy to relocate among different servers if needed.

One can follow the Docker documentation to install and get started.

“Docker is a platform for developers and sysadmins to build, run, and share applications with containers. The use of containers to deploy applications is called containerization. “

Useful commands

This command will output the details of the currently running containers including port forwarding, container name/id, and uptime.

This command will allow us to enter the running container’s environment to troubleshoot any issues we might have with the container. (the containers are not meant to be changed the correct way to deal with any issues is creating a new one after the troubleshot)

Docker-compose

“Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.“

Below it is an example of a docker-compose.yml file which can be used to bring a Partek Flow server with an extra worker.

These are some of the important tags shown above:

restart: whether you want the container to be restarted automatically upon failure or system restart.
image: the image we distributed and the desired version, even though we always recommend the users to run the latest version of the software if you need any specific versions of Partek Flow please visit here.
environment: here we set up any environment variables to be run along the container.
port: the default port to Partek Flow is 8080 and if you wish to change what port it will be accessible please change the first part (left to the colon) of 8080:8080. So if you wish to access the server on port 8888 then the correct format will be 8888:8080
mac_address: this needs to match your license file
volumes: in this section we specify the folder on the server to be shared with the container, this way we can better persist and access the files we create and use in the container.

Additional Assistance

Java KeyStore and Certificates

Java Keystore

JKS or Java KeyStore is used in Flow for some very specific scenarios where encryption is involved and there is a need for asymmetric encryption.

Partek Flow is shipped with a Java Keystore on its own, the file is found at .../partek_flow/distrib/flowkeystore where you may want to add your public and private certificates.

Adding a certificate to the KeyStore

If you already have a certificate please skip to the next step.

Create a certificate

Please place the key in a secure folder. (it is advisable to place in Flow's home directory. eg. /home/flow/keys

These commands above are meant to be used in a terminal. There are other ways to help you make a certificate but they will not going to be mentioned here.

If you wish to understand the flags used above please refer to the OpenSSL documentation.

Import a certificate into flowkeystore

For this step you will have to find where the cacerts file is located, it is under the Java installation, if you do not know how to do it contact us and we can help.

In the example the cacerts file is located at /usr/lib/jvm/java-11-openjdk-amd64/lib/security/cacerts

Tell the JVM where to find the key

We need to tell Partek Flow where the key is located, to do this we will edit a file which contains some of the Flow settings.

The file is usually located at /etc/partekflow.conf if you do not have this file we would advise to use the bashrc file from the system user that runs Partek Flow.

At the end of that file please add:

Additional Assistance

Kubernetes

Below are the yaml documents which describe the bare minimum infrastructure needed for a functional Flow server. It is best to start with a single-node proof of concept deployment. Once that works, the deployment can be extended to multi-node with elastic worker allocation. Each section is explained below.

The Flow headnode pod

apiVersion: v1
kind: Pod
metadata:
  name: flowheadnode
  namespace: partek-flow
  labels:
    app.kubernetes.io/name: flowheadnode
    deployment: dev
spec:
  securityContext:
    fsGroup: 1000
  containers:
    - name: flowheadnode
      image: xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/partek-flow:current-23.0809.22
      resources:
        requests:
          memory: "16Gi"
          cpu: 8
      env:
        - name: PARTEKLM_LICENSE_FILE
          value: "@flexlmserver"
        - name: PARTEK_COMMON_NO_TOTAL_LIMITS
          value: "1"
        - name: CATALINA_OPTS
          value: "-DFLOW_WORKER_MEMORY_MB=1024 -DFLOW_WORKER_CORES=2 -Djavax.net.ssl.trustStore=/etc/flowconfig/cacerts -Xmx14g"
      volumeMounts:
        - name: home-flow
          mountPath: /home/flow
        - name: flowconfig
          readOnly: true
          mountPath: "/etc/flowconfig"
  volumes:
    - name: home-flow
      persistentVolumeClaim:
        claimName: partek-flow-pvc
    - name: flowconfig
      secret:
        secretName: flowconfig

Pod metadata

On a kubernetes cluster, all Flow deployments are placed in their own namespace, for example namespace: partek-flow. The label app.kubernetes.io/name: flowheadnode allows binding of a service or used to target other kubernetes infrastructure to this headnode pod. The label deployment: dev allows running multiple Flow instances in this namespace (dev, tst, uat, prd, etc) if needed and allows workers to connect to the correct headnode. For stronger isolation, running each Flow instance in its own namespace is optimal.

Data storage

The Flow docker image requires 1) a writable volume mounted to /home/flow 2) This volume needs to be readable and writable by UID:GID 1000:1000 3) For a multi-node setup, this volume needs to be cross mounted to all worker pods. In this case, the persistent volume would be backed by some network storage device such as EFS, NFS, or a mounted FileGateway.

This section achieves goal 2)

spec:
  securityContext:
    fsGroup: 1000

The flowconfig volume is used to override behavior for custom Flow builds and custom integrations. It is generally not needed for vanilla deployments.

The Flow docker image

Partek Flow is shipped as a single docker image containing all necessary dependencies. The same image is used for worker nodes. Most deployment-related configuration is set as environment variables. Auxiliary images are available for additional supporting infrastructure, such as flexlm and worker allocator images.

Official Partek Flow images can be found on our release notes page: Release Notes The image tags assume the format: registry.partek.com/rtw:YY.MMMM.build New rtw images are generally released several times a month. The image in the example above references a private ECR. It is highly recommended that the target image from registry.partek.com be loaded into your ECR. Image pulls will be much faster from AWS - this reduces the time to dynamically allocate workers. It also removes a single point of failure - if registry.partek.com were down it would impact your ability to launch new workers on demand.

Flow headnode resource request

Partek Flow uses the head node to handle all interactive data visualization. Additional CPU resources are needed for this, the more the better and 8 is a good place to start. As for memory, we recommend 8 to 16 GiB. Resource limits are not included here, but are set to large values globally:

# This allows us to create pods with only a request set, but not a limit set. Further tuning is recommended. 
apiVersion: v1
kind: LimitRange
metadata:
  name: partek-flow-limit-range
spec:
  limits:
    - max:
        memory: 512Gi
        cpu: 64
      default:
        memory: 512Gi
        cpu: 64
      defaultRequest:
        memory: 4Gi
        cpu: 2
      type: Container

Relevant Flow headnode environment variables

PARTEKLM_LICENSE_FILE

Partek Flow uses FlexLM for licensing. Currently we do not offer or have implemented any alternative. Values for this environment variable can be:

@flexlmserveraddress

An external flexlm server. We provide a Partek specific container image and detail a kubernetes deployment for this below. This license server can also live outside the kubernetes cluster - the only requirement being that it is network accessible. /home/flow/.partekflow/license/Partek.lic - Use this path exactly. This path is internal to the headnode container and is persisted on a mounted PVC.

Unfortunately, FlexLM is MAC address based and does not quite fit in with modern containerized deployments. There is no straightforward or native way for kubernetes to set the MAC address upon pod/container creation, so using a license file on the flowheadnode pod (/home/flow/.partekflow/license/Partek.lic ) could be problematic (but not impossible). In further examples below, we provide a custom FlexLM container that can be instantiated as a pod/service. This works by creating a new network interface with the requested MAC address inside the FlexLM pod.

PARTEK_COMMON_NO_TOTAL_LIMITS

Please leave this set at "1". Partek Flow need not enforce any limits as that is the responsibility of kubernetes. Setting this to anything else may result in Partek executables hanging.

CATALINA_OPTS

This is a hodgepodge of Java/Tomcat options. Parts of interest:

-DFLOW_WORKER_MEMORY_MB=1024 -DFLOW_WORKER_CORES=2

It is possible for the Flow headnode to execute jobs locally in addition to dispatching them to remote workers. These two options set resource limits on the Flow internal worker to prevent resource contention with the Flow server. If remote workers are not used and this remains a single-node deployment, meaning ALL jobs will execute on the internal worker, then it is best to remove the CPU limit (-DFLOW_WORKER_CORES) and only set -DFLOW_WORKER_MEMORY_MB equal to the kubernetes memory resource request.

-Djavax.net.ssl.trustStore=/etc/flowconfig/cacerts

If Flow connects to a corporate LDAP server for authentication, it will need to trust the LDAP certificates.

-Xmx14g

JVM heap size. If the internal worker is not used, set this to be a little less than the kubernetes memory resource request. If the internal worker is an use, and the intent is to stay with a single-node deployment, then set this to be ~ 25% of the kubernetes memory resource request, but no less than ~ 4 GiB.

The Flow headnode service definition

apiVersion: v1
kind: Service
metadata:
  name: flowheadnode
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
      name: http
    - port: 2552
      targetPort: 2552
      protocol: TCP
      name: akka
    - port: 8443
      targetPort: 8443
      protocol: TCP
      name: licensing
  selector:
    app.kubernetes.io/name: flowheadnode

The flowheadnode service is needed 1) so that workers have a DNS name (flowheadnode) to connect to when they start and 2) so that we can attach an ingress route to make the Flow web interface accessible to end users. The app.kubernetes.io/name: flowheadnode selector is what binds this to the flowheadnode pod.

80:8080 - Users interact with Flow entirely over a web browser
2552:2552 - Workers communicate with the Flow server over port 2552
8443:8443 - Partek executed binaries connect back to the Flow server over port 8443 to do license checks

Ingress to flowheadnode

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: flowheadnode
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  rules:
    - host: flow.dev-devsvc.domain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: flowheadnode
                port:
                  number: 80

This provides external users HTTPS access to Flow at host: flow.dev-devsvc.domain.com Your details will vary. This is where we bind to the flowheadnode service.

The flexlm service pod

# On a NEW deployment, you need to exec into this pod and add the license file
# to /usr/local/flexlm/licenses
# After a license file is present, the flexlm daemon will start automatically

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: flexlmserver-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi     # flex.log is the only thing that slowly grows here
  storageClassName: gp2-ebs-sc
  volumeMode: Filesystem
---
apiVersion: v1
kind: Service
metadata:
  name: flexlmserver
spec:
  type: ClusterIP
  ports:
    - port: 27000
      targetPort: 27000
      protocol: TCP
      name: flexmain
    - port: 27001
      targetPort: 27001
      protocol: TCP
      name: flexvendor
  selector:
    app.kubernetes.io/name: flexlmserver
---
apiVersion: v1
kind: Pod
metadata:
  name: flexlmserver
  namespace: partek-flow
  labels:
    app.kubernetes.io/name: flexlmserver
spec:
  containers:
    - name: flexlmserver
      image: public.ecr.aws/partek-flow/kube-flexlm-server
      ports:
        - containerPort: 27000
        - containerPort: 27001
      resources:
        limits:
          memory: "256Mi"
          cpu: 1
      securityContext:
        capabilities:
          add: ["NET_ADMIN"]
      volumeMounts:
        - name: flexlmserver-pvc
          mountPath: /usr/local/flexlm/licenses
  volumes:
    - name: flexlmserver-pvc
      persistentVolumeClaim:
        claimName: flexlmserver-pvc

The yaml documents above will bring up a complete Partek-specific license server.

Note that the service name is flexlmserver. The flowheadnode pod connects to this license server via the PARTEKLM_LICENSE_FILE="@flexlmserver" environment variable.

You should deploy this flexlmserver first, since the flowheadnode will need it available in order to start in a licensed state.

Partek will send a Partek.lic file licensed to some random MAC address. When this license is (manually) written to /usr/local/flexlm/licenses, the pod will continue execution by creating a new network interface using the MAC address in Partek.lic, then it will start the licensing service. This is why the NET_ADMIN capability is added to this pod.

The license from Partek must contain VENDOR parteklm PORT=27001 so the vendor port remains at 27001 in order to match the service definition above. Without this, this port is randomly set by FlexLM.

This image is currently available from public.ecr.aws/partek-flow/kube-flexlm-server but this may change in the future.

Multi-Node Cluster Installation

Prior to installation, make sure you have the license key related to the host-ID of the compute cluster the software will be installed in. Contact licensing@partek.com for key generation.

Installation on a Computer Cluster
Integration with your queueing system
Bringing up workers
Shutting down workers
Updating Partek Flow

Installation on a Computer Cluster

Log into the flow account and proceed to the cd to the flow home directory

cd home/flow

Download Partek Flow and the remote worker package

wget --content-disposition http://packages.partek.com/linux/flow

wget --content-disposition http://packages.partek.com/linux/flow-worker

Unzip these files into the flow home directory /home/flow. This yields two directories: partek_flow and P_artekFlowRemoteWorker_
Partek Flow can generate large amounts of data, so it needs to be configured to the bulk of this data in the largest shared data store available. For this guide we assume that the directory is located at /shared. Adjust this path accordingly.
It is required that the Partek Flow server (which is running on the head node) and remote workers (which is running on the compute nodes) see identical file system paths for any directory Partek Flow has read or write access to. Thus /shared and /home/flow must be mounted on the Flow server and all compute nodes. Create the directory /shared/FlowData and allow the flow linux account write access to it
It is assumed the head node is attached to at least two separate networks: (1) a public network that allows users to log in to the head node and (2) a private backend network that is used for communication between compute nodes and the head node. Clients connect to the Flow web server on port 8080 so adjust the firewall to allow inbound connections to 8080 over the public network of the head node. Partek Flow will connect to remote workers over your private network on port 2552 and 8443, so make sure those ports are open to the private network on the flow server and workers.
Partek Flow needs to be informed of what private network to use for communication between the server and workers. It is possible that there are several private networks available (gigabit, infiniband, etc.) so select one to use. We recommend using the fastest network available. For this guide, let's assume that private network is 10.1.0.0/16. Locate the headnode hostname that resolves to an address on the 10.1.0.0/16 network. This must resolve to the same address on all compute nodes.
For example:

host head-node.local yields 10.1.1.200

Open /home/flow/.bashrc and add this as the last line:

export CATALINA_OPTS="$CATALINA_OPTS -Djava.awt.headless=true
-DflowDispatcher.flow.command.hostname=head-node.local
-DflowDispatcher.akka.remote.netty.tcp.hostname=head-node.local"

Source .bashrc so the environment variable CATALINA_OPTS is accessible.

NOTE: If workers are unable to connect (below), then replace all hostnames with their respective IPs.

Start Partek Flow

~/partek_flow/start_flow.sh

You can monitor progress by tailing the log file partek_flow/logs/catalina.out. After a few minutes, the server should be up.
Make sure the correct ports are bound

netstat -tulpn

You should see 10.1.1.200:2552 and :::8080 as LISTENing. Inspect catalina.out for additional error messages.
Open a browser and go to http://localhost:8080 on the head node to configure the Partek Flow server.
Enter the license key provided (Figure 1)

Figure 1. Setting up the Partek Flow license during installation

If there appears to be an issue with the license or there is a message about 'no workers attached', then restart Partek Flow. It may take 30 sec for the process to shut down. Make sure the process is terminated before starting the server back up:

~/partek_flow/stop_flow.sh

Then run:

~/partek_flow/start_flow.sh

You will now be prompted to setup the Partek Flow admin user (Figure 2). Specify the username (admin), password and email address for the administrator account and click Next

Figure 2. Setting up the Partek Flow 'admin' account during installation

Select a directory folder to store the library files that will be downloaded or generated by Partek Flow (Figure 3). All Partek Flow users share library files and the size of the library folder can grow significantly. We recommend at least 100GB of free space should be allocated for library files. The free space in the selected library file directory is shown. Click Next to proceed. You can change this directory after installation by changing system preferences. For more information, see Library file management.

Figure 3. Selecting the library file directory

To set up the Partek Flow data paths, click on Settings located on the top-right of the Flow server webpage. On the left, click on Directory permissions then Permit access to a new directory. Add /shared/PartekFlow and allow all users access.
Next click on System preferences on the left menu and change data download directory and default project output directory to /shared/PartekFlow/downloads and /shared/PartekFlow/project_output respectively

Note: If you do not see the /sharedfolder listed, click on the Refresh folder list link that is toward the bottom of the download directory dialog

Since you do not want to run any work on the head node, go to Settings>System preferences>Task queue and job processing and uncheck Start internal worker at Partek Flow server startup.
Restart the Flow server:

~/partek_flow/stop_flow.sh

After 30 seconds, run:

~/partek_flow/start_flow.sh

This is needed to disable the internal worker.

Test that remote workers can connect to the Flow server
Log in as the flow user to one of your compute nodes. Assume the hostname is compute-0. Since your home directory is exported to all compute nodes, you should be able to go to /home/flow/PartekFlowRemoteWorker/
To start the remote worker:

./partekFlowRemoteWorker.sh head-node.local compute-0

These two addresses should both be in the 10.1.0.0/16 address space. The remote worker will output to stdout when you run it. Scan for any errors. You should see the message woot! I'm online.
A successfully connected worker will show up on the Resource management page on the Partek Flow server. This can be reached from the main homepage or by clicking Resource management from the Settings page. Once you have confirmed the worker can connect, kill the remote worker (CTRL-C) from the terminal in which you started it.
Once everything is working, return to library file management and add the genomes/indices required by your research team. If Partek hosts these genomes/indices, these will automatically be downloaded by Partek Flow

Integration with your queueing system

In effect, all you are doing is submitting the following command as a batch job to bring up remote workers:

/home/flow/PartekFlowRemoteWorker/partekFlowRemoteWorker.sh head-node.local compute-0

The second parameter for this script can be obtained automatically via:

$(hostname -s)

Bringing up workers

Bring up workers by running the command below. You only need to run one worker per node:

/home/flow/PartekFlowRemoteWorker/partekFlowRemoteWorker.sh head-node.local compute-0

Shutting down workers

Updating Partek Flow

Go to the Flow installation directory. This is usually the home directory of the Linux user that runs Flow and it should contain a directory named "partek_flow". The location of the Flow install can also be obtained by running ps aux | grep flow and examining the path of the running Flow executable.
Shut down Flow:

./partek_flow/stop_flow.sh

Download the new version of Flow and the Flow worker:

wget --content-disposition http://packages.partek.com/linux/flow-release

wget --content-disposition http://packages.partek.com/linux/flow-worker-release

Make sure Flow has exited:

ps aux | grep flow

The flow process should no longer be listed.

Unpack the new version of Flow install and backup the old install:

mv partek_flow partek_flow_prev

mv PartekFlowRemoteWorker PartekFlowRemoteWorker_prev

Backup the Flow database folder. This should be located in the home directory of the user that runs Flow.

tar -czvf partek-db-bkp-date.tgz ~/.partekflow

Start the updated version of Flow:

./partek_flow/start_flow.sh

tail -f partek_flow/logs/catalina.out

(make sure there is nothing of concern in this file when starting up Flow. You can stop the file tailing by typing: CTRL-C)

You may also want to examine the the main Flow log for errors:

~/.partekflow/logs/flow.log

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Single Node Amazon Web Services Deployment

Creating a New Elastic Compute Cloud Instance for Partek Flow Software
Enabling External Access to the Partek Flow Elastic Compute Cloud Instance)
Attaching the Amazon Elastic Block Store Volume for Partek Flow Data Storage)
Installing Partek Flow on a New Elastic Compute Cloud Instance
Partek Amazon Web Services Support
General Recommendations
Amazon Web Services Instance Type Resources and Costs
Elastic Block Store Volumes

Creating a New Elastic Compute Cloud Instance for Partek Flow Software

Click on EC2

Switch to the region intended to deploy Partek Flow software. This tutorial uses US East (N. Virginia) as an example.

On the left menu, click on Instances, then click the Launch Instance button. The Choose an Amazon Machine Image (AMI) page will appear.

Click the Select button next to Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-f4cc1de2. NOTE: Please use the latest Ubuntu AMI. It is likely that the AMI listed here will be out of date.

NOTE: New instance types will become available. Please use the latest mX instance type provided as it will likely perform better and be more cost effective than older instance types.

On the Configure Instance Details page, make the following selections:

Set the number of instances to 1. An autoscaling group is not necessary for single-node deployments
Purchasing Option: Leave Request Spot Instances unchecked. This is relevant for cost-minimization of Partek Flow cluster deployments.
Network: If you do not have a virtual private cloud (VPC) already created for Partek Flow, click Create New VPC. This will open a new browser tab for VPC management.
- Use the following settings for the VPC:
  - Name Tag: Flow-VPC
  - IPv4 CIDR block: 10.0.0.0/16
  - Select No IPv6 CIDR Block
  - Tenancy: Default
- Click Yes, Create. You may be asked to select a DHCP Option set. If so, then make sure the dynamic host configuration protocol (DHCP) option set has the following properties:
  - Options: domain-name = ec2.internal;domain-name-servers = AmazonProvidedDNS;
  - DNS Resolution: leave the defaults set to yes
  - DNS Hostname: change this to yes as internal DNS resolution may be necessary depending on the Partek Flow deployment
- Once created, the new Flow-VPC will appear in the list of available VPCs. The VPC needs additional configuration for external access. To continue, right click on Flow-VPC and select Edit DNS Resolution, select Yes, and then Save. Next, right click the Flow-VPC and select Edit DNS Hostnames, select Yes, then Save.
- Make sure the DHCP option set is set to the one created above. If it is not, right-click on the row containing Flow-VPC and select Edit DHCP Option Sets.
- Close the VPC Management tab and go back to the EC2 Management Console.
Click the refresh arrow next to Create New VPC and select Flow-VPC.
Click Create New Subnet and a new browser tab will open with a list of existing subnets. Click Create Subnet and set the following options:
- Name Tag: Flow-Subnet
- VPC: Flow-VPC
- VPC CIDRs: This should be automatically populated with the information from Flow-VPC
- Availability Zone: It is OK to let Amazon choose for you if you do not have a preference
- IPv4 CIDR block: 10.0.1.0/24
Stay on the VPC Dashboard Tab and on the left navigation menu, click Internet Gateways, then click Create Internet Gateway and use the following options:
- Name Tag: Flow-IGW
- Click Yes, Create
The new gateway will be displayed as Detached. Right click on the Flow-IGW gateway and select Attach to VPC, then select Flow-VPC and click Yes, Attach.
Click on Route Tables on the left navigation menu.
If it exists, select the route table already associated with Flow-VPC. If not, make a new route table and associate it with Flow-VPC. Click on the new route table, then click the Routes tab toward the bottom of the page. The route Destination = 10.0.0.0/16 Target = local should already be present. Click Edit, then Click Add another route and set the following parameters:
- Destination: 0.0.0.0/0
- Target set to Flow-IGW (the internet gateway that was just created)
Click Save
Close the VPC Dashboard browser tab and go back to the EC2 Management Console tab. Note that you should still be on Step 3: Configure Instance Details.

Click the refresh arrow next to Create New Subnet and select Flow-Subnet.

Auto-assign Public IP: Use subnet setting (Disable)

Placement Group: No placement group

IAM role: None.

Shutdown Behaviour: Stop

Enable Termination Protection: select Protect against accidental termination

Monitoring: leave Enable CloudWatch Detailed Monitoring disabled

EBS-optimized Instance: Make sure Launch as EBS-optimized Instance is enabled. Given the recommended choice of an m4 instance type, EBS optimization should be enabled at no extra cost.

Tenancy: Shared - Run a shared hardware instance

Network Interfaces: leave as-is

Advanced Details: leave as-is

Click Next: Add Storage. You should be on Step 4: Add Storage

For the existing root volume, set the following options:

Size: 8 GB
Volume Type: Magnetic
Select Delete on Termination
- Note: All Partek Flow data is stored on a non-root EBS volume. Since only the OS is on the root volume and not frequently re-booted, a fast root volume is probably not necessary or worth the cost. For more information about EBS volumes and their performance, see the section EBS volumes.

Click Add New Volume and set the following options:

Volume Type: EBS
Device: /dev/sdb (take the default)
Do not define a snapshot
Size (GiB): 500
- Note: This is the minimum for ST1 volumes, see: EBS volumes
Volume Type: Throughput optimized HDD (ST1)
Do not delete on terminate or encrypt

Click Next: Add Tags

You do not need to define any tags for this new EC2 instance, but you can if you would like.

Click Next: Configure Security Group

For Assign a Security Group select Create a New Security Group
Security Group Name: Flow-SG
Description: Security group for Partek Flow server
Add the following rules:
- SSH set Source to My IP (or the address range of your company or institution)
- Click Add Rule:
- Set Type to Custom TCP Rule
- Set Port Range to 8080
- Set Source to anywhere (0.0.0.0/0, ::/0)
  - Note: It is recommended to restrict Source to just those that need access to Partek Flow.

Click Review and Launch

The AWS console will suggest this server not be booted from a magnetic volume. Since there is not a lot of IO on the root partition and reboots are will be rare, choosing Continue with Magnetic will reduce costs. Choosing an SSD volume will not provide substantial benefit but it OK if one wishes to use an SSD volume. See the EBS Volumes section for more information.

Click Launch

Create a new keypair:

Name the keypair Flow-Key
Download this keypair, the run chmod 600 Flow-Key.pem (the downloaded key) so it can be used.
Backup this key as one may lose access to the Partek Flow instance without it.

The new instance will now boot. Use the left navigation bar and click on Instances. Click the pencil icon and assign the instance the name Partek Flow Server

Enabling External Access to the Partek Flow Elastic Compute Cloud Instance

The server should be assigned a fixed IP address. To do this, click on Elastic IPs on the left navigation menu from the EC2 Management Console.

Click Allocate New Address
Assign Scope to VPC
Click Allocate

On the table containing the newly allocated elastic IP, right click and select Associate Address

For Instance, select the instance name Flow Test Server
For Private IP, select the one private IP available for the Partek Flow EC2 instance, then click Associate

Note: For the remaining steps, we refer to the elastic ip as elastic.ip

SSH to the new Flow-Server instance:

$ chmod 600 Flow-Key.pem

$ ssh -i Flow-Testing.pem ubuntu@elastic.ip

Attaching the Amazon Elastic Block Store Volume for Partek Flow Data Storage

$ sudo su

$ mkfs -t ext4 /dev/xvdb

Make a note of the newly created UUID for this volume

Copy the ubuntu home directory onto the EBS volume using a temporary mount point:

$ mount -t ext4 /dev/xvdb /mnt/

$ rsync -avr /home/ /mnt/

$ umount /mnt/

Make the EBS volume mount at system boot:

Add the following to /etc/fstab: UUID=the-UUID-from-the-mkfs-command-above /home ext4 defaults,nofail 0 2

$ mount -a

Disconnect the ssh session, then log in again to make sure all is well

Installing Partek Flow on a New Elastic Compute Cloud Instance

Note: For additional information about Partek Flow installations, see our generic Installation Guide

Install required packages for Partek Flow:

$ sudo apt-get update

$ sudo apt-get install software-properties-common

$ sudo add-apt-repository -y ppa:openjdk-r/ppa

$ sudo apt-get install openjdk-8-jdk python python-pip python-dev zlib1g-dev python-matplotlib r-base python-htseq libxml2-dev perl make gcc g++ zlib1g libbz2-1.0 libstdc++6 libgcc1 libncurses5 libsqlite3-0 libfreetype6 libpng12-0 zip unzip libgomp1 libxrender1 libxtst6 libxi6 debconf

$ sudo pip install --upgrade pip && pip install --upgrade --upgrade-strategy eager --force-reinstall virtualenv numpy pysam cnvkit

Install Partek Flow:

Note: Make sure you are running as the ubuntu user.

$ cd (we will install Partek Flow to ubuntu's home directory)

$ wget --content-disposition packages.partek.com/linux/flow-release

$ unzip PartekFlow*.zip

$ ./partek_flow/start_flow.sh

Partek Flow has finished loading when you see INFO: Server startup in xxxxxxx ms in the partek_flow/logs/catalina.out log file. This takes ~30 seconds.

Alternative: Install Flow with Docker. Our base packages are located here: https://hub.docker.com/r/partekinc/flow/tags

Open Partek Flow with a web browser: http://elastic.ip:8080/

Enter license key

Set up the Partek Flow admin account

Leave the library file directory at its default location and check that the free space listed for this directory is consistent with what was allocated for the ST1 EBS volume.

Done! Partek Flow is ready to use.

Partek Amazon Web Services Support

$ curl -F "file=@FlowKey.pem" https://installfeedback.partek.com/fupload

General Recommendations

We recommend HVM virtualization as we have not seen any performance impact from using them and non-HVM instance types can come with significant deployment barriers.

Make sure your instance is EBS optimized by default and you are not charged a surcharge for EBS optimization.

T-class servers, although cheap, may slow responsiveness for the Partek Flow server and generally do not provide sufficient resources.

Amazon Web Services Instance Type Resources and Costs

The values below were updated April 2017. The latest pricing and EC2 resource offerings can be found at http://www.ec2instances.info

Instance Type

Memory

Cores

EBS throughput

Network Performance

Monthly cost

Single server recommendation: m4.xlarge or m4.2xlarge

Network performance values for US-EAST-1 correspond to: Low ~ 50Mb/s, Medium ~ 300Mb/s, High ~ 1Gb/s.

Elastic Block Store Volumes

Choice of a volume type and size:

Max throughput 500 MiB/s

$0.045 per GB-month of provisioned storage ($22.5 per month for a 500 GB of storage).

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Installation Guide

Additional Assistance

Minimum System Requirements

Web Browser Requirements

Hardware Requirements (Single-node Linux)

Hardware Requirements (Cluster or Cloud)

Storage Recommendations

Additional Assistance

Single Cell Toolkit System Requirements

Up to 100,000 cells per analysis

Required

Recommended

More than 100,000 cells per analysis

Required

Recommended

Additional Assistance

Single Node Installation

Additional Assistance

Single Node Amazon Web Services Deployment

Creating a New Elastic Compute Cloud Instance for Partek Flow Software

Enabling External Access to the Partek Flow Elastic Compute Cloud Instance

Attaching the Amazon Elastic Block Store Volume for Partek Flow Data Storage

Installing Partek Flow on a New Elastic Compute Cloud Instance

Partek Amazon Web Services Support

General Recommendations

Amazon Web Services Instance Type Resources and Costs

Elastic Block Store Volumes

Additional Assistance

Multi-Node Cluster Installation

Installation on a Computer Cluster

Integration with your queueing system

Bringing up workers

Shutting down workers

Updating Partek Flow

Additional Assistance

Creating Restricted User Folders within the Partek Flow server

Additional Assistance

Updating Partek Flow

Additional Assistance

Uninstalling Partek Flow

Linux

MacOS

Additional Assistance

Dependencies

CNVkit

DESeq2

HTSeq

MACS3

Python

R

Variant Effect Predictor

DECoN

Additional Assistance

Docker and Docker-compose

Docker

Useful commands

Docker-compose

Additional Assistance

Java KeyStore and Certificates

Java Keystore

Adding a certificate to the KeyStore

Create a certificate

Import a certificate into flowkeystore

Tell the JVM where to find the key

Additional Assistance

Kubernetes

The Flow headnode pod

Pod metadata

Data storage

The Flow docker image

Flow headnode resource request

Relevant Flow headnode environment variables

PARTEKLM_LICENSE_FILE

@flexlmserveraddress

PARTEK_COMMON_NO_TOTAL_LIMITS

CATALINA_OPTS

The Flow headnode service definition

Ingress to flowheadnode

The flexlm service pod

Single Node Installation