Cluster Installation Guide

Partek® Flow® is a genomics data analysis and visualization software product designed to run on compute clusters. The following instructions assume the most basic setup of Partek Flow and must only be attempted by system administrators who are familiar with Linux-based commands.

These instructions are not intended to be comprehensive. Cluster environments are largely variable, thus there are no 'one size fits all' instructions. In all cases, Partek Technical Support will be available to assist with cluster installation and maintenance to ensure compatibility with any cluster environment. For single-node installations, refer to the installation guide:

Prior to installation, make sure you have the license key related to the host-ID of the compute cluster the software will be installed in. Contact licensing@partek.com for key generation.

Instructions

Make a standard linux user account that will run the Partek Flow server and all associated processes. It is assumed this account is synced between the cluster head node and all compute nodes. For this guide, we name the account flow

Log into the flow account and proceed to the cd to the flow home directory

cd home/flow

Download Partek Flow and the remote worker package

wget http://www.partek.com/~devel/flow-release.zip
wget http://www.partek.com/~devel/flow-worker-release.zip

Unzip these files into the flow home directory /home/flow. This yields two directories: partek_flow and PartekFlowRemoteWorker

Partek Flow can generate large amounts of data, so it needs to be configured to the bulk of this data in the largest shared data store available. For this guide we assume that the directory is located at /shared. Adjust this path accordingly.

It is required that the Partek Flow server (which is running on the head node) and remote workers (which is running on the compute nodes) see identical file system paths for any directory Partek Flow has read or write access to. Thus /shared and /home/flow must be mounted on the Flow server and all compute nodes. Create the directory /shared/FlowData and allow the flow linux account write access to it.

It is assumed the head node is attached to at least two separate networks: (1) a public network that allows users to log in to the head node and (2) a private backend network that is used for communication between compute nodes and the head node. Clients connect to the Flow web server on port 8080 so adjust the firewall to allow inbound connections to 8080 over the public network of the head node. Partek Flow will connect to remote workers over you private network on port 2552, so make sure that port is also open.

Partek Flow needs to be informed of what private network to use for communication between the server and workers. It is possible that there are several private networks available (gigabit, infiniband, etc.) so select one to use. We recommend using the fastest network available. For this guide, let's assume that private network is 10.1.0.0/16. Locate the headnode hostname that resolves to an address on the 10.1.0.0/16 network. This must resolve to the same address on all compute nodes.

For example:
host head-node.local
yields
10.1.1.200
Open /home/flow/.bashrc and add this as the last line:
export CATALINA_OPTS="$CATALINA_OPTS -Djava.awt.headless=true
-DflowDispatcher.flow.command.hostname=head-node.local
-DflowDispatcher.akka.remote.netty.hostname=head-node.local"
Source .bashrc so the environment variable CATALINA_OPTS is accessible.
NOTE: If workers are unable to connect (below), then replace all hostnames with their respective IPs.

Start Partek Flow

~/partek_flow/start_flow.sh

You can monitor progress by tailing the log file partek_flow/logs/catalina.out. After a few minutes, the server should be up.

Make sure the correct ports are bound

netstat -tulpn

You should see 10.1.1.200:2552 and :::8080 as LISTENing. Inspect catalina.out for additional error messages.

Open a browser and go to http://localhost:8080 on the head node to configure the Partek Flow server.

Enter the license key provided (Figure 1)

Figure 1: Setting up the Partek Flow license during installation

If there appears to be an issue with the license or there is a message about 'no workers attached', then restart Partek Flow. It may take 30 sec for the process to shut down. Make sure the process is terminated before starting the server back up.:

~/partek_flow/stop_flow.sh
Then run:
~/partek_flow/start_flow.sh

You will now be prompted to setup the Partek Flow admin user (Figure 2). Specify the username (admin), password and email address for the administrator account and click Next

Figure 2: Setting up the Partek Flow 'admin' account during installation

Partek Flow will also to set up library files (Figure 3). Partek-distributed library files and aligner indices can be pre-downloaded, to save time while performing analysis in the future. Change the library file directory to /shared/FlowData/library_files. Unnecessary genome builds may be skipped by clicking the red × button before clicking Next.

Figure 3: Downloading Partek-distributed library files during installation

To set up the Partek Flow data paths, click on Settings located on the top-right of the Flow server webpage. On the left, click on Directory permissions then Permit access to a new directory. Add /shared/PartekFlow and allow all users access.

Next click on System preferences on the left menu and change data download directory and default project output directory to /shared/PartekFlow/downloads and /shared/PartekFlow/project_output respectively

Note: If you do not see the /shared folder listed, click on the Refresh folder list link that is toward the bottom of the download directory dialog.

Since you do not want to run any work on the head node, go to Settings>Server configuration>Task queue settings and uncheck Use internal server worker

Restart the Flow server:

~/partek_flow/stop_flow.sh
After 30 seconds, run:
~/partek_flow/start_flow.sh
This is needed to disable the internal worker.

Test that remote workers can connect to the Flow server.

Log in as the flow user to one of your compute nodes. Assume the hostname is compute-0. Since your home directory is exported to all compute nodes, you should be able to go to /home/flow/PartekFlowRemoteWorker/

To start the remote worker:

./partekFlowRemoteWorker.sh head-node.local compute-0

These two addresses should both be in the 10.1.0.0/16 address space. The remote worker will output to stdout when you run it. Scan for any errors. You should see the message woot! I'm online.

A successfully connected worker will show up on the Resource management page on the Partek Flow server. This can be reached from the main homepage or by clicking Resource management from the Settings page. Once you have confirmed the worker can connect, kill the remote worker (CTRL-C) from the terminal in which you started it.

Once everything is working, return to library file management and add the genomes/indices required by your research team. If Partek hosts these genomes/indices, these will automatically be downloaded by Partek Flow

Integration with your queueing system

In effect, all you are doing is submitting the following command as a batch job to bring up remote workers: /home/flow/PartekFlowRemoteWorker/partekFlowRemoteWorker.sh head-node.local compute-0

The second parameter for this script can be obtained automatically via:

$(hostname -s)

Bringing up and shutting down workers

Bring up workers by running the command below. You only need to run one worker per node:
/home/flow/PartekFlowRemoteWorker/partekFlowRemoteWorker.sh head-node.local compute-0

Shutting down workers

Go to the Resource management page and click on the Stop button (red square) next to the worker you wish to shut down. The worker will shut down gracefully, as in it will wait for currently running work on that node to finish, then it will shut down.

Updating Partek Flow

For the cluster update, you will get a link of .zip file for Partek Flow and remote Flow worker respectively from Partek support, all of the following actions should be performed as the Linux user that runs Flow. Do NOT run Flow as root.
1) Go to the Flow installation directory. This is usually the home directory of the Linux user that runs Flow and it should contain a directory named "partek_flow". The location of the Flow install can also be obtained by running ps aux | grep flow and examining the path of the running Flow executable.
2) Shut down Flow: ./partek_flow/stop_flow.sh
3) Download the new version of Flow and the Flow worker from the link Partek support send (below is an example of the links):
wget[ |http://www.partek.com/%7Edevel/daily_builds/PartekFlow-LINUX-5.0.16.0624.zip]http://www.partek.com/PartekFlow-LINUX-current-version.zip
_wget http://www.partek.com/PartekFlow-RemoteWorker-current-version.zip_
4) Make sure Flow has exited:
ps aux | grep flow
The flow process should no longer be listed.
5) Unpack the new version of Flow install and backup the old install:
mv partek_flow partek_flow_prev
mv PartekFlowRemoteWorker PartekFlowRemoteWorker_prev
6)

Additional Assistance

If you need additional assistance, please visit partek.com/PartekSupport to submit a help ticket or find regional phone numbers to call Partek support.
Last revision: April 15, 2016

Copyright © 2016 by Partek Incorporated. All Rights Reserved. Reproduction of this material without express written consent from Partek Incorporated is strictly prohibited.

Partek Flow Documentation

Page tree