Log in to your AWS management console. The login URL may depend on your organization, but it is likey

Click on EC2

Switch to the region you will deploy Flow. For this tutorial, we will be using US East (N. Virginia)

On the left menu, click on Instances, then click the Lauch Instance button. You will be brought to the first step denoted "Choose an Amazon Machine Image (AMI)"

Click the Select button next to "Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-f4cc1de2"

For the next step "Choose an Instance Type" the selection depends on your budget and the size of the Flow deployment. We recommend m4.large for testing or cluster front-end operation, m4.xlarge for standard deployments, and m4.2xlarge for alignment-heavy workloads with a large user-base. See (AWS instance type resources and costs) for assistance with choosing the right instance. In most cases, the instance type and associated resources can be changed after deployment, so one is not locked in to the choices made for this step.  

For "Configure Instance Details", make the following selections:

Number of instances = 1. You do not need to create an autoscaling an autoscaling group for single-node deployments.

Purchasing option : Leave "Request Spot instances" unchecked. This is relevant for cost-minimization of Flow cluster deployments.

Network: If you do not have a VPC already created for Flow, click "Create new VPC". This will open a new browser tab for VPC management.

Use the following settings for the VPC:

Name tag: Flow-VPC

IPv4 CIDR block:

Select "No IPv6 CIDR Block"

Tenancy: Default

Click "Yes, Create". You may be asked to select a DHCP option set. If so, then make sure the DHCP option set you select or create has the following options:

Options: domain-name = ec2.internal;domain-name-servers = AmazonProvidedDNS;

DNS resolution: leave the defaults set to yes

DNS hostname: change this to yes and you may need internal dns resolution depending on the Flow deployment.

Once created, the new Flow-VPC will appear in the list of available VPCs. The VPC needs additional configuration for external access. To continue, right click on Flow-VPC and select "Edit DNS Resolution", then select "Yes" and then Save. Next, right click the Flow-VPC and select "Edit DNS Hostnames", select Yes, then Save. 


Now, close the VPC management tab and go back to the EC2 management console. Click the refresh arrow next to "Create new VPC" and select the Flow-VPC VPC.

Next, click "Create new subnet" and a new browser tab will open with a list of existing subnets. Click "Create Subnet" and set the following options:name

Name tag: Flow-





avail zone us-east-1d


VPC CIDRs: This should be automatically populated with the information from Flow-VPC

Availability Zone: It is OK to let Amazon choose for you if you do not have a preference

IPv4 CIDR block:

Create internet gatewayStay on the VPC dashboard tab and on the left navigation menu, click "Internet gateways", then click "Create Internet Gateway" and use the following options:

Name tag = Flow-


IGW, then click "Yes, create".

The new gateway will be displayed as "detached". Right click on the Flow-IGW gateway and select "Attach to VPC", then select Flow-




Go to route tables:

select the one associated with your VPC


and click "Yes, Attach"

Click on "Route Tables" on the left navigation menu. 

If it exists, select the route table already associated with Flow-VPC. If not, make a new route table and associate it with Flow-VPC. Click on the new route table, then click the "Routes" tab toward the bottom of the page. The route Destination = Target = local should already be present. Click Edit, then Click "Add another route" and set the following parameters:





set to


Flow-IGW (the internet gateway just created)

Click "Save"

Close the VPC Dashboard browser tab and go back to the EC2 Management Console tab. We are still on "Step 3: Configure Instance Details." Click the refresh arrow next to "Create new subnet" and select Flow-Subnet.

Auto-assign public ip: use Use subnet setting (Disable)

Placement group (none): No placement group

IAM role : new

role name = Flow-Testing

role type : add

Amazon EC2

leave policies alone


shutdown behavior Stop

enable termination protection

do not need cloudwatch

make sure EBS optimized is on

Tenancy = shared

leave network interfaces alone

leave advanced details alone

Add storageNone. NOTE: For multi-node Flow deployments or instances where you would like Partek to manage AWS resources on your behalf, please see (Partek AWS support) and set up an IAM role as needed.

Shutdown behavior: Stop

Enable termination protection: select "Protect against accidental termination"

Monitoring: leave "Enable CloudWatch detailed monitoring" disabled

EBS-optimized instance: Make sure "Launch as EBS-optimized instance" is enabled and non-selectable. Given the recommended choice of a m4 instance type, EBS optimization should be enabled at no extra costs. 

Tenancy: Shared - Run a shared hardware instance

Network interfaces: leave as-is

Advanced details: leave as-is

Click "Next: Add storage"

Add storage:

Root /dev/sda1 -snap- 8GB vol type = magnetic, delete on termination, not encrypted.

add another -

type = EBS, device /dev/sdb, no snapshot, size=500 (min for st1), Throughput optimized HDD, throughput =  20  20 / 123 (can’t change) Baseline: 40 MB/s per TiB, no delete on terminate or encrypt


no tags


create new security group

name = Flow-Testing

desc = Default Flow SG for testing

1) ssh (defaults) source = myIP

2) add rule, custom, port range = 8080, source anywhere (, ::/0)


Boot from…

Continue with Magnetic


Create a new keypair : Flow-Testing, download it. chmod 600 it


Instance now boots, go to instances, give it a name “Flow Test Server”


Make new elastic IP, scope = VPC

New associate address, resource type = instance, pick “Flow Test Server”, select the one private IP available, click reassociation




Can’t connect, go to services, VPC, your VPC

(solved above by subnet rules)


Set the frontend domain name (Cluster only)

get started under DNS management

hosted zones, create hosted zone

domain name = flowcluster

type = private for VPC

set to Flow-Testing VPC


hosted zones => create record set

naem = frontend type A - ipv4 address

value set to (flow IP)

routing policy simple


Connect to instance:

chmod the key

ssh -i ~/aws-keys/Flow-Testing.pem

sudo su

/etc/hostname -> frontend.flowcluster (reboot)


sudo su

mkfs -t ext4 /dev/xvdb

test and move:

mount -t ext4 /dev/xvdb /mnt/

rsync -avr /home/ /mnt/

umount /mnt/


vi /etc/fstab

UUID=24311e67-c70a-4b4c-9d2a-e9c016dbce29 /home   ext4    defaults,nofail        0       2

mount -a

logout then login


We contain everything in /home/ubuntu, so we need to do a zip install.

Install Flow (we assume you have the license) zip install to keep everything in /home/ubuntu:


Install the following packages beforehand:

sudo apt-get update

sudo apt-get install  python perl make gcc g++ zlib1g libbz2-1.0 libstdc++6 libgcc1 libncurses5 libsqlite3-0 libfreetype6 libpng12-0 zip unzip libgomp1 libxrender1 libxtst6 libxi6 debconf


Now Flow:

Exit back to ubuntu. Be in the home directory


wget --content-disposition



vi ~/.bashrc and at the end:

export CATALINA_OPTS="-DflowDispatcher.flow.command.hostname=frontend.flowcluster -DflowDispatcher.akka.remote.netty.tcp.hostname=frontend.flowcluster"


source ~/.bashrc




Enter license key

Set up admin account

Leave library file directory where it is and check that free space matches what you expect (~500 GB)


Support and multinode:

IAM Role:

role name = Flow-Testing

role type : add


Amazon EC2


leave policies alone



Background info and rationale for choices:

Candidate nodes, cheapest to most expensive. Cost is dynamic cost (see reserved pricing).

Want network speed to be 1GB/s or greater for multi-node setups.

Want HVM

No EBS optimized surcharge

Want placement group support

No T class servers as don’t want to slow responsiveness

We don’t use instance store since all data is lost after instance stop. Too risky.


EBS-only options:

Type:  Mem:  Cores  Mem:  Cores:         EBS throughput (netw rate):   Monthly cost:

m4.large 8large 8.0 GB GB             2 vCPUs    56.25 MB/s  M                         $78.840

r4.large 15large 15.25 GB 2 GB 2 vCPUs     50 MB/s       H(10G int)           $97.09

*m4.xlarge 16xlarge 16.0 GB 4 GB 4 vCPUs     93.75 MB/s  H                         $156.950

r4.xlarge 30xlarge 30.5 GB 4 GB 4 vCPUs     100 MB/s     H                         $194.180

*m4.2xlarge 322xlarge 32.0 GB 8 GB 8 vCPUs     125 MB/s     H              $314.630

r4.2xlarge 612xlarge 61.0 GB 8 GB 8 vCPUs     200 MB/s     H(10G int)           $388.360


Single server recommendation: m4.2xlarge

Cluster head node recommendation: m4.xlarge

Soma “test” server with worst case hardware: m4.large


Performance may suffer if one chooses smaller nodes.

Once can always shut down and change the instance types if a particular choice is insufficient. EBS volumes can be grown or performance changed.


Network speed (netw rate) US-EAST-1 internal and external.

L = 50Mb/s

M = 300Mb/s

H = 1Gb/s.


See network benchmarks:


EBS types:

Use throughput optimized HDD for flow data.

Max throughput 500 MiB/s

$0.045 per GB-month of provisioned storage

500 GB min provisioned storage so min cost = $22.5 per month for a 500 GB drive.

Single-Node install : change pref to beefy server

Multi-node: less-beefy head node


Other Notes:


Pricing and resource table:



each vCPU is a hyperthread of an Intel Xeon core

1 ECU is the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor


ADD: placement group


paravirtualization: faster because not fully virtualized, but major drawback: You need a region-specific kernel object for each Linux instance. So just stick with hvm for ease of deployment. Hvm has increased in performance.


spreadsheet: removed anything that was not EBS optimized. removed GPU nodes, remove micro, nano, small instances

don't care about gpu support

remove < 4 GB memory


Linus grow filesystem:
