Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This document describes how to set up and configure a single-node Partek Flow license.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Flow is a genomics data analysis and visualization software product designed to run on compute clusters. The following instructions assume the most basic setup of Partek Flow and must only be attempted by system administrators who are familiar with Linux-based commands. These instructions are not intended to be comprehensive. Cluster environments are largely variable, thus there are no 'one size fits all' instructions. The installation procedure on a computer cluster is highly dependent on the type of computer cluster and the environment it is located. We can to support a large array of Linux distributions and configurations. In all cases, Partek Technical Support will be available to assist with cluster installation and maintenance to ensure compatibility with any cluster environment. Please consult with Partek Licensing Support (licensing@partek.com) for additional information.
Prior to installation, make sure you have the license key related to the host-ID of the compute cluster the software will be installed in. Contact licensing@partek.com for key generation.
Make a standard linux user account that will run the Partek Flow server and all associated processes. It is assumed this account is synced between the cluster head node and all compute nodes. For this guide, we name the account flow
Log into the flow account and proceed to the cd to the flow home directory
Download Partek Flow and the remote worker package
Unzip these files into the flow home directory /home/flow. This yields two directories: partek_flow and P_artekFlowRemoteWorker_
Partek Flow can generate large amounts of data, so it needs to be configured to the bulk of this data in the largest shared data store available. For this guide we assume that the directory is located at /shared. Adjust this path accordingly.
It is required that the Partek Flow server (which is running on the head node) and remote workers (which is running on the compute nodes) see identical file system paths for any directory Partek Flow has read or write access to. Thus /shared and /home/flow must be mounted on the Flow server and all compute nodes. Create the directory /shared/FlowData and allow the flow linux account write access to it
It is assumed the head node is attached to at least two separate networks: (1) a public network that allows users to log in to the head node and (2) a private backend network that is used for communication between compute nodes and the head node. Clients connect to the Flow web server on port 8080 so adjust the firewall to allow inbound connections to 8080 over the public network of the head node. Partek Flow will connect to remote workers over your private network on port 2552 and 8443, so make sure those ports are open to the private network on the flow server and workers.
Partek Flow needs to be informed of what private network to use for communication between the server and workers. It is possible that there are several private networks available (gigabit, infiniband, etc.) so select one to use. We recommend using the fastest network available. For this guide, let's assume that private network is 10.1.0.0/16. Locate the headnode hostname that resolves to an address on the 10.1.0.0/16 network. This must resolve to the same address on all compute nodes.
For example:
host head-node.local yields 10.1.1.200
Open /home/flow/.bashrc and add this as the last line:
Source .bashrc so the environment variable CATALINA_OPTS is accessible.
NOTE: If workers are unable to connect (below), then replace all hostnames with their respective IPs.
Start Partek Flow
You can monitor progress by tailing the log file partek_flow/logs/catalina.out. After a few minutes, the server should be up.
Make sure the correct ports are bound
You should see 10.1.1.200:2552 and :::8080 as LISTENing. Inspect catalina.out for additional error messages.
Open a browser and go to http://localhost:8080 on the head node to configure the Partek Flow server.
Enter the license key provided (Figure 1)
If there appears to be an issue with the license or there is a message about 'no workers attached', then restart Partek Flow. It may take 30 sec for the process to shut down. Make sure the process is terminated before starting the server back up:
Then run:
You will now be prompted to setup the Partek Flow admin user (Figure 2). Specify the username (admin), password and email address for the administrator account and click Next
Select a directory folder to store the library files that will be downloaded or generated by Partek Flow (Figure 3). All Partek Flow users share library files and the size of the library folder can grow significantly. We recommend at least 100GB of free space should be allocated for library files. The free space in the selected library file directory is shown. Click Next to proceed. You can change this directory after installation by changing system preferences. For more information, see Library file management.
To set up the Partek Flow data paths, click on Settings located on the top-right of the Flow server webpage. On the left, click on Directory permissions then Permit access to a new directory. Add /shared/PartekFlow and allow all users access.
Next click on System preferences on the left menu and change data download directory and default project output directory to /shared/PartekFlow/downloads and /shared/PartekFlow/project_output respectively
Note: If you do not see the /sharedfolder listed, click on the Refresh folder list link that is toward the bottom of the download directory dialog
Since you do not want to run any work on the head node, go to Settings>System preferences>Task queue and job processing and uncheck Start internal worker at Partek Flow server startup.
Restart the Flow server:
After 30 seconds, run:
This is needed to disable the internal worker.
Test that remote workers can connect to the Flow server
Log in as the flow user to one of your compute nodes. Assume the hostname is compute-0. Since your home directory is exported to all compute nodes, you should be able to go to /home/flow/PartekFlowRemoteWorker/
To start the remote worker:
These two addresses should both be in the 10.1.0.0/16 address space. The remote worker will output to stdout when you run it. Scan for any errors. You should see the message woot! I'm online.
A successfully connected worker will show up on the Resource management page on the Partek Flow server. This can be reached from the main homepage or by clicking Resource management from the Settings page. Once you have confirmed the worker can connect, kill the remote worker (CTRL-C) from the terminal in which you started it.
Once everything is working, return to library file management and add the genomes/indices required by your research team. If Partek hosts these genomes/indices, these will automatically be downloaded by Partek Flow
In effect, all you are doing is submitting the following command as a batch job to bring up remote workers:
The second parameter for this script can be obtained automatically via:
Bring up workers by running the command below. You only need to run one worker per node:
Go to the Resource management page and click on the Stop button (red square) next to the worker you wish to shut down. The worker will shut down gracefully, as in it will wait for currently running work on that node to finish, then it will shut down.
For the cluster update, you will get a link of .zip file for Partek Flow and remote Flow worker respectively from Partek support, all of the following actions should be performed as the Linux user that runs Flow. Do NOT run Flow as root.
Go to the Flow installation directory. This is usually the home directory of the Linux user that runs Flow and it should contain a directory named "partek_flow". The location of the Flow install can also be obtained by running ps aux | grep flow and examining the path of the running Flow executable.
Shut down Flow:
Download the new version of Flow and the Flow worker:
Make sure Flow has exited:
The flow process should no longer be listed.
Unpack the new version of Flow install and backup the old install:
Backup the Flow database folder. This should be located in the home directory of the user that runs Flow.
Start the updated version of Flow:
(make sure there is nothing of concern in this file when starting up Flow. You can stop the file tailing by typing: CTRL-C)
You may also want to examine the the main Flow log for errors:
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Note: This guide assumes all items necessary for the Amazon elastic Comput Clout (EC2) instance does not exist, such as Amazon Virtual Private Cloud (VPC), subnets, and security groups, thus their creation is covered as well.
Log in to the Amazon Web Services (AWS) management console at https://console.aws.amazon.com
Click on EC2
Switch to the region intended to deploy Partek Flow software. This tutorial uses US East (N. Virginia) as an example.
On the left menu, click on Instances, then click the Launch Instance button. The Choose an Amazon Machine Image (AMI) page will appear.
Click the Select button next to Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-f4cc1de2. NOTE: Please use the latest Ubuntu AMI. It is likely that the AMI listed here will be out of date.
Choose an Instance Type, the selection depends on your budget and the size of the Partek Flow deployment. We recommend m4.large for testing or cluster front-end operation, m4.xlarge for standard deployments, and m4.2xlarge for alignment-heavy workloads with a large user-base. See the section AWS instance type resources and costs for assistance with choosing the right instance. In most cases, the instance type and associated resources can be changed after deployment, so one is not locked into the choices made for this step.
NOTE: New instance types will become available. Please use the latest mX instance type provided as it will likely perform better and be more cost effective than older instance types.
On the Configure Instance Details page, make the following selections:
Set the number of instances to 1. An autoscaling group is not necessary for single-node deployments
Purchasing Option: Leave Request Spot Instances unchecked. This is relevant for cost-minimization of Partek Flow cluster deployments.
Network: If you do not have a virtual private cloud (VPC) already created for Partek Flow, click Create New VPC. This will open a new browser tab for VPC management.
Use the following settings for the VPC:
Name Tag: Flow-VPC
IPv4 CIDR block: 10.0.0.0/16
Select No IPv6 CIDR Block
Tenancy: Default
Click Yes, Create. You may be asked to select a DHCP Option set. If so, then make sure the dynamic host configuration protocol (DHCP) option set has the following properties:
Options: domain-name = ec2.internal;domain-name-servers = AmazonProvidedDNS;
DNS Resolution: leave the defaults set to yes
DNS Hostname: change this to yes as internal DNS resolution may be necessary depending on the Partek Flow deployment
Once created, the new Flow-VPC will appear in the list of available VPCs. The VPC needs additional configuration for external access. To continue, right click on Flow-VPC and select Edit DNS Resolution, select Yes, and then Save. Next, right click the Flow-VPC and select Edit DNS Hostnames, select Yes, then Save.
Make sure the DHCP option set is set to the one created above. If it is not, right-click on the row containing Flow-VPC and select Edit DHCP Option Sets.
Close the VPC Management tab and go back to the EC2 Management Console.
Click the refresh arrow next to Create New VPC and select Flow-VPC.
Click Create New Subnet and a new browser tab will open with a list of existing subnets. Click Create Subnet and set the following options:
Name Tag: Flow-Subnet
VPC: Flow-VPC
VPC CIDRs: This should be automatically populated with the information from Flow-VPC
Availability Zone: It is OK to let Amazon choose for you if you do not have a preference
IPv4 CIDR block: 10.0.1.0/24
Stay on the VPC Dashboard Tab and on the left navigation menu, click Internet Gateways, then click Create Internet Gateway and use the following options:
Name Tag: Flow-IGW
Click Yes, Create
The new gateway will be displayed as Detached. Right click on the Flow-IGW gateway and select Attach to VPC, then select Flow-VPC and click Yes, Attach.
Click on Route Tables on the left navigation menu.
If it exists, select the route table already associated with Flow-VPC. If not, make a new route table and associate it with Flow-VPC. Click on the new route table, then click the Routes tab toward the bottom of the page. The route Destination = 10.0.0.0/16 Target = local should already be present. Click Edit, then Click Add another route and set the following parameters:
Destination: 0.0.0.0/0
Target set to Flow-IGW (the internet gateway that was just created)
Click Save
Close the VPC Dashboard browser tab and go back to the EC2 Management Console tab. Note that you should still be on Step 3: Configure Instance Details.
Click the refresh arrow next to Create New Subnet and select Flow-Subnet.
Auto-assign Public IP: Use subnet setting (Disable)
Placement Group: No placement group
IAM role: None.
Note: For multi-node Partek Flow deployments or instances where you would like Partek to manage AWS resources on your behalf, please see Partek AWS support and set up an IAM role for your Partek Flow EC2 instance. In most cases a specialized IAM role is unnecessary and we only need instance ssh keys.
Shutdown Behaviour: Stop
Enable Termination Protection: select Protect against accidental termination
Monitoring: leave Enable CloudWatch Detailed Monitoring disabled
EBS-optimized Instance: Make sure Launch as EBS-optimized Instance is enabled. Given the recommended choice of an m4 instance type, EBS optimization should be enabled at no extra cost.
Tenancy: Shared - Run a shared hardware instance
Network Interfaces: leave as-is
Advanced Details: leave as-is
Click Next: Add Storage. You should be on Step 4: Add Storage
For the existing root volume, set the following options:
Size: 8 GB
Volume Type: Magnetic
Select Delete on Termination
Note: All Partek Flow data is stored on a non-root EBS volume. Since only the OS is on the root volume and not frequently re-booted, a fast root volume is probably not necessary or worth the cost. For more information about EBS volumes and their performance, see the section EBS volumes.
Click Add New Volume and set the following options:
Volume Type: EBS
Device: /dev/sdb (take the default)
Do not define a snapshot
Size (GiB): 500
Note: This is the minimum for ST1 volumes, see: EBS volumes
Volume Type: Throughput optimized HDD (ST1)
Do not delete on terminate or encrypt
Click Next: Add Tags
You do not need to define any tags for this new EC2 instance, but you can if you would like.
Click Next: Configure Security Group
For Assign a Security Group select Create a New Security Group
Security Group Name: Flow-SG
Description: Security group for Partek Flow server
Add the following rules:
SSH set Source to My IP (or the address range of your company or institution)
Click Add Rule:
Set Type to Custom TCP Rule
Set Port Range to 8080
Set Source to anywhere (0.0.0.0/0, ::/0)
Note: It is recommended to restrict Source to just those that need access to Partek Flow.
Click Review and Launch
The AWS console will suggest this server not be booted from a magnetic volume. Since there is not a lot of IO on the root partition and reboots are will be rare, choosing Continue with Magnetic will reduce costs. Choosing an SSD volume will not provide substantial benefit but it OK if one wishes to use an SSD volume. See the EBS Volumes section for more information.
Click Launch
Create a new keypair:
Name the keypair Flow-Key
Download this keypair, the run chmod 600 Flow-Key.pem (the downloaded key) so it can be used.
Backup this key as one may lose access to the Partek Flow instance without it.
The new instance will now boot. Use the left navigation bar and click on Instances. Click the pencil icon and assign the instance the name Partek Flow Server
The server should be assigned a fixed IP address. To do this, click on Elastic IPs on the left navigation menu from the EC2 Management Console.
Click Allocate New Address
Assign Scope to VPC
Click Allocate
On the table containing the newly allocated elastic IP, right click and select Associate Address
For Instance, select the instance name Flow Test Server
For Private IP, select the one private IP available for the Partek Flow EC2 instance, then click Associate
Note: For the remaining steps, we refer to the elastic ip as elastic.ip
SSH to the new Flow-Server instance:
Attach, format, and move the ubuntu home directory onto the large ST1 elastic block store (EBS) volume. All Partek Flow data will live in this volume. Consult the AWS EC2 documentation for further information about attaching EBS volumes to your instance.
Note: Under Volumes in the EC2 management console, inspect Attachment Information. It will likely list the large ST1 EBS volume as attached to /dev/sdb. Replace "s" with "xv" to find the device name to use for this mkfs command.
Make a note of the newly created UUID for this volume
Copy the ubuntu home directory onto the EBS volume using a temporary mount point:
Make the EBS volume mount at system boot:
Add the following to /etc/fstab: UUID=the-UUID-from-the-mkfs-command-above /home ext4 defaults,nofail 0 2
Disconnect the ssh session, then log in again to make sure all is well
Note: For additional information about Partek Flow installations, see our generic Installation Guide
Before beginning, send the media access control (MAC) address of the EC2 instance to MAC address of the EC2 instance to licensing@partek.com. The output of ifconfig will suffice. Given this information, Partek employees will create a license for your AWS server. MAC addresses will remain the same after stopping and starting the Partek Flow EC2 instance. If the MAC address does change, let our licensing department know and we can add your license to our floating license server or suggest other workarounds.
Install required packages for Partek Flow:
Install Partek Flow:
Note: Make sure you are running as the ubuntu user.
Partek Flow has finished loading when you see INFO: Server startup in xxxxxxx ms in the partek_flow/logs/catalina.out log file. This takes ~30 seconds.
Alternative: Install Flow with Docker. Our base packages are located here: https://hub.docker.com/r/partekinc/flow/tags
Open Partek Flow with a web browser: http://elastic.ip:8080/
Enter license key
Set up the Partek Flow admin account
Leave the library file directory at its default location and check that the free space listed for this directory is consistent with what was allocated for the ST1 EBS volume.
Done! Partek Flow is ready to use.
After the EC2 instance is provisioned, we are happy to assist with setting up Partek Flow or address other issues you encounter with the usage of Partek Flow. The quickest way to receive help is to allow us remote access to your server by sending us Flow-Key.pem and amending the SSH rule for Flow-SG to include access from IP 97.84.41.194 (Partek HQ). We recommend sending us the Flow-Key.pem via secure means. The easiest way to do this is with the following command:
We also provide live assistance via GoTo meeting or TeamViewer if you are uncomfortable with us accessing your EC2 instance directly. Before contacting us, please run $ ./partek_flow/flowstatus.sh to send us logs and other information that will assist with your support request.
With newer EC2 instance types, it is possible to change the instance type of an already deployed Partek Flow EC2 server. We recommend doing several rounds of benchmarks with production-sized workloads and evaluate if the resources allocated to your Partek Flow server are sufficient. You may find that reducing resources allocated to the Partek Flow server may come with significant cost savings, but can cause UI responsiveness and job run-times to reach unacceptable levels. Once you have found an instance type that works, you may wish to use reserved instance pricing which is significantly cheaper than on-demand instance pricing. Reserved instances come with 1 or 3-year usage terms. Please see the EC2 Reserved Instance Marketplace to sell or purchase existing reserved instances at reduced rates.
The network performance of the EC2 instance type becomes an important factor if the primary usage of Partek Flow is for alignment. For this use case, one will have to move copious amounts of data back (input fastq files) and forth (output bam files) between the Partek Flow server and the end users, thus it is important to have as what AWS refers to as high network performance which for most cases is around 1 Gb/s. If the focus is primarily on downstream analysis and visualization (e.g. the primary input files are ADAT) then network performance is less of a concern.
We recommend HVM virtualization as we have not seen any performance impact from using them and non-HVM instance types can come with significant deployment barriers.
Make sure your instance is EBS optimized by default and you are not charged a surcharge for EBS optimization.
T-class servers, although cheap, may slow responsiveness for the Partek Flow server and generally do not provide sufficient resources.
We do not recommend placing any data on instance store volumes since all data is lost on those volumes after an instance stops. This is too risky as there are cases where user tasks can take up unexpected amounts of memory forcing a server stop/reboot.
The values below were updated April 2017. The latest pricing and EC2 resource offerings can be found at http://www.ec2instances.info
Single server recommendation: m4.xlarge or m4.2xlarge
Network performance values for US-EAST-1 correspond to: Low ~ 50Mb/s, Medium ~ 300Mb/s, High ~ 1Gb/s.
Choice of a volume type and size:
This is dependent on the type of workload. For must users, the Partek Flow server tasks are alignment-heavy so we recommend a throughput optimized HDD (ST1) EBS volume since most aligner operations are sequential in nature. For workloads that focus primarily on downstream analysis, a general purpose SSD volume will suffice but the costs are greater. For those who focus on alignment or host several users, the storage requirements can be high. ST1 EBS volumes have the following characteristics:
Max throughput 500 MiB/s
$0.045 per GB-month of provisioned storage ($22.5 per month for a 500 GB of storage).
Note that EBS volumes can be grown or performance characteristics changed. To minimize costs, start with a smaller EBS volume allocation of 0.5 - 2 TB as most mature Partek Flow installations generate roughly this amount of data. When necessary, the EBS volume and the underlying file system can be grown on-line (making ext4 a good choice). Shrinking is also possible but may require the Partek Flow server to be offline.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Flow is a web-based application for genomic data analysis and visualization. It can be installed on a desktop computer, computer cluster or cloud. Users can then access Partek Flow from any browser-enabled device, such as a personal computer, tablet or smartphone.
Read on to learn about the following installation topics:
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Regardless of whether Partek Flow is installed on a server or the cloud, users will be interacting with the software using a web browser. We support the latest Google Chrome, Mozilla Firefox, Microsoft Edge and Apple Safari browsers. While we make an effort to ensure that Partek Flow is robust, note that some browser plugins may affect the way the software is viewed on your browser.
If you are installing Partek Flow on your own single-node server, we require the following for successful installation:
Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8 or later versions of these distributions
64-bit 2GHz quad-core processor1
48GB of RAM2
> 2TB of storage available for data
> 100GB on the root partition
A broadband internet connection
We support Docker-based installations. Please contact support@partek.com for more information.
1Note that some analyses have higher system requirements for example to run the STAR aligner on a reference genome of size ~3 GB (such as human, mouse or rat), 16 cores are required.
2Input sample file size can also impact memory usage, which is particularly the case for TopHat alignments.
Increasing hardware resources (cores, RAM, disk space, and speed) will allow for faster processing of more samples.
If you are licensed for the Single Cell Toolkit, please see Single Cell Toolkit System Requirements for amended hardware requirements.
Proper storage planning is necessary to avoid future frustration and costly maintenance. Here are several DO's and DO NOT's:
DO:
Plan for at least 3 to 5 times more storage than you think is necessary. Investing extra funds in storage and storage performance is always worth it.
Keep all Flow data on a single partition that is expandable, such as RAID or LVM.
Back up your data, especially the Partek Flow database.
DO NOT:
Store data on 'removable' USB drives. Partek Flow will not be able to see these drives.
Store data across multiple partitions or folder locations. This will increase the maintenance burden substantially.
Use non-Linux file systems like NTFS.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Open a terminal window and enter the following command.
Debian/Ubuntu:
RedHat/Fedora/CentOS:
The uninstall removes binaries only (/opt/partek_flow). The logs, database (partek_db) and files in the home/flow/.partekflow folder will remain unaffected.
Stop and quit Partek Flow using the Partek Flow app in the menu.
Using Finder, delete Flow application from the Applications menu.
Missing image Figure 1. Control of Partek Flow through the menu bar
This process does not delete data or the library files. Users who wish to delete those can delete them using Finder or terminal. The default location of project output files and library files is the /FlowData directory under the user's home folder. However, the actual location may vary depending on your System or Project specific settings.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Before performing updates, we recommend Backing Up the Database.
Updates are applied using the Linux package manager.
Make sure Partek Flow is stopped before updating it.
To update Partek Flow, open a terminal window and enter the following command.
For Debian/Unbuntu, enter:
For Redhat/Fedora/CentOS, enter:
For the YUM package manager, if updating Partek Flow fails with a message claiming "package not signed," enter:
Note that our packages are signed and the message above is erroneous.
For tomcat build update, download the latest version from below:
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Because of the large size of single cell RNA-Seq data sets and the computationally-intensive tools used in single cell analysis, we have amended our system requirements and recommendations for installations of Partek Flow with the Single Cell toolkit.
Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 64 GB of RAM
Local scratch space*: 1 TB with cached or native speeds of 2GB/s or higher
Storage: > 2 TB available for data and > 100 GB on the root partition
Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 128 GB of RAM
Local scratch space1: 2 TB with cached or native speeds of 2GB/s or higher
Storage: > 2 TB available for data and > 100 GB on the root partition
Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 256 GB of RAM
Local scratch space1: 2 TB with cached or native speeds of 2GB/s or higher
Storage: > 4 TB available for data
Linux: Ubuntu® 18.04, Redhat® 8, CentOS® 8, or newer
CPU: 64-bit 2 GHz quad-core processor
Memory: 512 GB of RAM
Local scratch space1: 10 TB with cached or native speeds of 2GB/s or higher
Storage: 10 TB available for data
For fastest performance:
Newer generation CPU cores with avx2 or avx-512 are recommended.
Performance scales proportionality to the number of CPU cores available.
Hyper thread cores (threads) scales performance for most operations other than principal component analysis.
*Contact Partek support for recommended setup of local scratch storage
Docker can be used along Partek Flow to deploy an easy to maintain environment which will not have dependency issues and will be easy to relocate among different servers if needed.
One can follow the Docker documentation to install and get started.
“Docker is a platform for developers and sysadmins to build, run, and share applications with containers. The use of containers to deploy applications is called containerization. “
This command will output the details of the currently running containers including port forwarding, container name/id, and uptime.
This command will allow us to enter the running container’s environment to troubleshoot any issues we might have with the container. (the containers are not meant to be changed the correct way to deal with any issues is creating a new one after the troubleshot)
“Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.“
Below it is an example of a docker-compose.yml file which can be used to bring a Partek Flow server with an extra worker.
These are some of the important tags shown above:
restart: whether you want the container to be restarted automatically upon failure or system restart.
image: the image we distributed and the desired version, even though we always recommend the users to run the latest version of the software if you need any specific versions of Partek Flow please visit here.
environment: here we set up any environment variables to be run along the container.
port: the default port to Partek Flow is 8080 and if you wish to change what port it will be accessible please change the first part (left to the colon) of 8080:8080. So if you wish to access the server on port 8888 then the correct format will be 8888:8080
mac_address: this needs to match your license file
volumes: in this section we specify the folder on the server to be shared with the container, this way we can better persist and access the files we create and use in the container.
Partek Flow provides the infrastructure to isolate data from different users within the same server. This guide will provide general instructions on how to create this environment within Partek Flow. This can be modified to accommodate existing file systems already accessible to the server.
Go to Settings > Directory permissions and restrict parent folder access (typically /home/flow) to Administrator accounts only
Click the Permit access to a new directory button and navigate to the folder with your library files (typically /home/flow/FlowData/library_files). Select the All users (automatically updated) checkbox to permit all users (including those that will be added in the future) to see the library files associated with the Partek Flow server
Then go to System preferences > Filesystem and storage and set the Default project output directory to "Sample file directory"
Create your first user and select the Private directory checkbox. Specify where the private directory for that user is located
If needed, you can create a user directory by clicking Browse > Create new folder
This automatically sets browsing permissions for that private directory to that user
When a user creates a project. The default project output directory is now within their own restricted folder
More importantly, other users cannot see them
Add additional users as needed
Flow ships with tasks that do not have all of their dependencies included. On startup Flow will attempt to install the dependencies, but not every system is equipped to install them.
In the case of any difficulties, it is highly recommended to instead use a docker deployment (cluster installations may require singularity instead, which is somewhat still a work-in-progress)Z
Requires Python 2.7 or later.
On startup Flow will attempt to install additional python packages using the command
Requires R 3.2.3 or later.
On startup Flow will attempt to install additional R packages.
There are cascading dependencies, but you can view the core libraries in partek_flow/bin/cnvkit-0.8.5/install.R
If these packages can't be built locally, it may be possible for the user to download them from us (see below).
Requires R 3.0 or later.
On startup Flow will attempt to install additional R packages.
There are cascading dependencies, but you can view the core libraries in partek_flow/bin/deseq_two-3.5/install.R
If these packages can't be built locally, it may be possible for the user to download them from us (see below).
RcppArmadillo may also have dependencies on multi-threading shared objects that may not be on the LD_LIBRARY_PATH
The recommendation is to copy those .so files to a folder and make sure it is available from the LD_LIBRARY_PATH when the server/worker starts.
Additional dynamic libraries (such as libxml2.so) may be missing and we can provide a copy appropriate for the target OS.
Requires Python 2.7 or 3.4 or above
On startup Flow attempts to install using pip
Requires python 3.0 or above
If there are any conflicts with preinstalled python packages, Flow should be configured to run with its own virtual environment:
or
R can usually be installed from the package manager. If the user installs Flow via apt or yum it should already be installed.
Currently, we offer a set of R packages compatible with some versions of R
Extract this file in the home directory. (Make .R a symlink if the home directory doesn't have enough free space)
These packages include the dependencies for both CNVkit and DESeq2
When running R diagnostic commands outside flow, it can simplify things if the environment includes a reference to the ~/.R folder:
or load
in ~/.Rprofile
list loaded packages:
get the version:
This is a compiled Perl script (so it has no direct dependency on Perl itself) we have had one report (istem.fr) of it failing to run.
DECoN requires R version 3.1.2
It must be installed under /opt/R-3.1.2 or set the DECON_R environment variable to its folder
Download DECoN
and install it under /opt/DECoN or set the DECON_PATH environment variable to its folder
You may need to add
to Linux/packrat/packrat.opts
JKS or Java KeyStore is used in Flow for some very specific scenarios where encryption is involved and there is a need for asymmetric encryption.
Partek Flow is shipped with a Java Keystore on its own, the file is found at .../partek_flow/distrib/flowkeystore where you may want to add your public and private certificates.
If you already have a certificate please skip to the next step.
Please place the key in a secure folder. (it is advisable to place in Flow's home directory. eg. /home/flow/keys
These commands above are meant to be used in a terminal. There are other ways to help you make a certificate but they will not going to be mentioned here.
If you wish to understand the flags used above please refer to the OpenSSL documentation.
For this step you will have to find where the cacerts file is located, it is under the Java installation, if you do not know how to do it contact us and we can help.
In the example the cacerts file is located at /usr/lib/jvm/java-11-openjdk-amd64/lib/security/cacerts
We need to tell Partek Flow where the key is located, to do this we will edit a file which contains some of the Flow settings.
The file is usually located at /etc/partekflow.conf if you do not have this file we would advise to use the bashrc file from the system user that runs Partek Flow.
At the end of that file please add:
Figure 1. Setting up the Partek Flow license during installation
Figure 2. Setting up the Partek Flow 'admin' account during installation
Figure 3. Selecting the library file directory
Instance Type | Memory | Cores | EBS throughput | Network Performance | Monthly cost |
---|---|---|---|---|---|
Please contact if you would like to install Partek Flow on your own HPC or cloud account. We will assist in assessing your hardware needs and can make recommendations regarding provisioning sufficient resources to run the software.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Partek will work with the customer to make a docker-compose file that will have all the configuration necessary to run Partek Flow on any machine that meets our .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
For older operating systems R is not available and will need to be installed from
DECoN comes pre-installed in the flow_dna container
Documentation on installing DECoN is available here:
See also:
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
m4.large
8.0 GB
2 vCPUs
56.25 MB/s M
Medium
$78.840
r4.large
15.25 GB
2 vCPUs
50 MB/s H(10G int)
High (+10G interface)
$97.09
m4.xlarge
16.0 GB
4 vCPUs
93.75 MB/s H
High
$156.950
r4.xlarge
30.5 GB
4 vCPUs
100 MB/s H
High
$194.180
m4.2xlarge
32.0 GB
8 vCPUs
125 MB/s H
High
$314.630
r4.2xlarge
61.0 GB
8 vCPUs
200 MB/s H(10G int)
High (+10G interface)
$388.360
Below are the yaml documents which describe the bare minimum infrastructure needed for a functional Flow server. It is best to start with a single-node proof of concept deployment. Once that works, the deployment can be extended to multi-node with elastic worker allocation. Each section is explained below.
On a kubernetes cluster, all Flow deployments are placed in their own namespace, for example namespace: partek-flow. The label app.kubernetes.io/name: flowheadnode allows binding of a service or used to target other kubernetes infrastructure to this headnode pod. The label deployment: dev allows running multiple Flow instances in this namespace (dev, tst, uat, prd, etc) if needed and allows workers to connect to the correct headnode. For stronger isolation, running each Flow instance in its own namespace is optimal.
The Flow docker image requires 1) a writable volume mounted to /home/flow 2) This volume needs to be readable and writable by UID:GID 1000:1000 3) For a multi-node setup, this volume needs to be cross mounted to all worker pods. In this case, the persistent volume would be backed by some network storage device such as EFS, NFS, or a mounted FileGateway.
This section achieves goal 2)
The flowconfig volume is used to override behavior for custom Flow builds and custom integrations. It is generally not needed for vanilla deployments.
Partek Flow is shipped as a single docker image containing all necessary dependencies. The same image is used for worker nodes. Most deployment-related configuration is set as environment variables. Auxiliary images are available for additional supporting infrastructure, such as flexlm and worker allocator images.
Official Partek Flow images can be found on our release notes page: Release Notes The image tags assume the format: registry.partek.com/rtw:YY.MMMM.build New rtw images are generally released several times a month. The image in the example above references a private ECR. It is highly recommended that the target image from registry.partek.com be loaded into your ECR. Image pulls will be much faster from AWS - this reduces the time to dynamically allocate workers. It also removes a single point of failure - if registry.partek.com were down it would impact your ability to launch new workers on demand.
Partek Flow uses the head node to handle all interactive data visualization. Additional CPU resources are needed for this, the more the better and 8 is a good place to start. As for memory, we recommend 8 to 16 GiB. Resource limits are not included here, but are set to large values globally:
Partek Flow uses FlexLM for licensing. Currently we do not offer or have implemented any alternative. Values for this environment variable can be:
An external flexlm server. We provide a Partek specific container image and detail a kubernetes deployment for this below. This license server can also live outside the kubernetes cluster - the only requirement being that it is network accessible. /home/flow/.partekflow/license/Partek.lic - Use this path exactly. This path is internal to the headnode container and is persisted on a mounted PVC.
Unfortunately, FlexLM is MAC address based and does not quite fit in with modern containerized deployments. There is no straightforward or native way for kubernetes to set the MAC address upon pod/container creation, so using a license file on the flowheadnode pod (/home/flow/.partekflow/license/Partek.lic ) could be problematic (but not impossible). In further examples below, we provide a custom FlexLM container that can be instantiated as a pod/service. This works by creating a new network interface with the requested MAC address inside the FlexLM pod.
Please leave this set at "1". Partek Flow need not enforce any limits as that is the responsibility of kubernetes. Setting this to anything else may result in Partek executables hanging.
This is a hodgepodge of Java/Tomcat options. Parts of interest:
It is possible for the Flow headnode to execute jobs locally in addition to dispatching them to remote workers. These two options set resource limits on the Flow internal worker to prevent resource contention with the Flow server. If remote workers are not used and this remains a single-node deployment, meaning ALL jobs will execute on the internal worker, then it is best to remove the CPU limit (-DFLOW_WORKER_CORES) and only set -DFLOW_WORKER_MEMORY_MB equal to the kubernetes memory resource request.
If Flow connects to a corporate LDAP server for authentication, it will need to trust the LDAP certificates.
JVM heap size. If the internal worker is not used, set this to be a little less than the kubernetes memory resource request. If the internal worker is an use, and the intent is to stay with a single-node deployment, then set this to be ~ 25% of the kubernetes memory resource request, but no less than ~ 4 GiB.
The flowheadnode service is needed 1) so that workers have a DNS name (flowheadnode) to connect to when they start and 2) so that we can attach an ingress route to make the Flow web interface accessible to end users. The app.kubernetes.io/name: flowheadnode selector is what binds this to the flowheadnode pod.
80:8080 - Users interact with Flow entirely over a web browser
2552:2552 - Workers communicate with the Flow server over port 2552
8443:8443 - Partek executed binaries connect back to the Flow server over port 8443 to do license checks
This provides external users HTTPS access to Flow at host: flow.dev-devsvc.domain.com Your details will vary. This is where we bind to the flowheadnode service.
The yaml documents above will bring up a complete Partek-specific license server.
Note that the service name is flexlmserver. The flowheadnode pod connects to this license server via the PARTEKLM_LICENSE_FILE="@flexlmserver" environment variable.
You should deploy this flexlmserver first, since the flowheadnode will need it available in order to start in a licensed state.
Partek will send a Partek.lic file licensed to some random MAC address. When this license is (manually) written to /usr/local/flexlm/licenses, the pod will continue execution by creating a new network interface using the MAC address in Partek.lic, then it will start the licensing service. This is why the NET_ADMIN capability is added to this pod.
The license from Partek must contain VENDOR parteklm PORT=27001 so the vendor port remains at 27001 in order to match the service definition above. Without this, this port is randomly set by FlexLM.
This image is currently available from public.ecr.aws/partek-flow/kube-flexlm-server but this may change in the future.