Partek Flow Documentation

Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »


Instance Type and AMI Selection

Log in to the AWS management console at: https://console.aws.amazon.com

Click on EC2

Switch to the region intended deploy Flow. This tutorial uses US East (N. Virginia) as an example. 

On the left menu, click on Instances, then click the Launch Instance button. The first step denoted Choose an Amazon Machine Image (AMI) appears.

Click the Select button next to Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-f4cc1de2

For the next step called Choose an Instance Type the selection depends on your budget and the size of the Flow deployment. We recommend m4.large for testing or cluster front-end operation, m4.xlarge for standard deployments, and m4.2xlarge for alignment-heavy workloads with a large user-base. See the section AWS instance type resources and costs for assistance with choosing the right instance. In most cases, the instance type and associated resources can be changed after deployment, so one is not locked in to the choices made for this step.  

For the step Configure Instance Details, make the following selections:

Set number of instances to 1. An autoscaling group is not necessary for single-node deployments

Purchasing option: Leave Request Spot instances unchecked. This is relevant for cost-minimization of Flow cluster deployments.

Network: If you do not have a VPC already created for Flow, click Create new VPC. This will open a new browser tab for VPC management. 

Use the following settings for the VPC:

Name tag: Flow-VPC

IPv4 CIDR block: 10.0.0.0/16

Select No IPv6 CIDR Block

Tenancy: Default

Click Yes, Create. You may be asked to select a DHCP option set. If so, then make sure the DHCP option set has the following properties:

Options: domain-name = ec2.internal;domain-name-servers = AmazonProvidedDNS;

DNS resolution: leave the defaults set to yes

DNS hostname: change this to yes as internal dns resolution may be necessary depending on the Flow deployment

Once created, the new Flow-VPC will appear in the list of available VPCs. The VPC needs additional configuration for external access. To continue, right click on Flow-VPC and select Edit DNS Resolution, select Yes, and then Save. Next, right click the Flow-VPC and select Edit DNS Hostnames, select Yes, then Save. 

Make sure the DHCP option set is set to the one created above. If it is not, right click on the row containing Flow-VPC and select Edit DHCP option sets.

Close the VPC management tab and go back to the EC2 management console.

Click the refresh arrow next to Create new VPC and select Flow-VPC.

Click Create new subnet and a new browser tab will open with a list of existing subnets. Click Create Subnet and set the following options:

Name tag: Flow-Subnet

VPC: Flow-VPC

VPC CIDRs: This should be automatically populated with the information from Flow-VPC

Availability Zone: It is OK to let Amazon choose for you if you do not have a preference

IPv4 CIDR block: 10.0.1.0/24

Stay on the VPC dashboard tab and on the left navigation menu, click Internet gateways, then click Create Internet Gateway and use the following options:

Name tag: Flow-IGW

Click Yes, create

The new gateway will be displayed as "detached". Right click on the Flow-IGW gateway and select "Attach to VPC", then select Flow-VPC and click "Yes, Attach"

Click on "Route Tables" on the left navigation menu. 

If it exists, select the route table already associated with Flow-VPC. If not, make a new route table and associate it with Flow-VPC. Click on the new route table, then click the "Routes" tab toward the bottom of the page. The route Destination = 10.0.0.0/16 Target = local should already be present. Click Edit, then Click "Add another route" and set the following parameters:

Destination 0.0.0.0/0

Target set to Flow-IGW (the internet gateway just created)

Click "Save"

Close the VPC Dashboard browser tab and go back to the EC2 Management Console tab. We are still on "Step 3: Configure Instance Details." Click the refresh arrow next to "Create new subnet" and select Flow-Subnet.

Auto-assign public ip: Use subnet setting (Disable)

Placement group: No placement group

IAM role : None. NOTE: For multi-node Flow deployments or instances where you would like Partek to manage AWS resources on your behalf, please see (Partek AWS support) and set up an IAM role as needed.

Shutdown behavior: Stop

Enable termination protection: select "Protect against accidental termination"

Monitoring: leave "Enable CloudWatch detailed monitoring" disabled

EBS-optimized instance: Make sure "Launch as EBS-optimized instance" is enabled and non-selectable. Given the recommended choice of a m4 instance type, EBS optimization should be enabled at no extra costs. 

Tenancy: Shared - Run a shared hardware instance

Network interfaces: leave as-is

Advanced details: leave as-is

Click "Next: Add storage". You should be on the "Step 4: Add Storage"

For the existing root volume, set the following options:

Size: 8GB

Volume type: Magnetic

Select "Delete on termination"

Click "Add New Volume" and set the following options:

Volume Type: EBS

Device: /dev/sdb (take the default)

Do not define a snapshot

Size (GiB): 500 This is the minimum for st1 volumes (See: Notes about EBS volumes)

Volume Type: Throughput optimized HDD (ST1)

Do not delete on terminate or encrypt

Click "Next: Add Tags"

You do not need to define any tags for this new EC2 instance, but you can if you would like.

Click "Next: Configure Security Group" 

For "Assign a security group" select "Create a new security group"

Security group name: Flow-SG

Description: Security group for Flow server

Add the following rules:

1) SSH set Source to My IP (or the address range of your company or institution)

2) Click Add Rule, Type is Custom TCP Rule, Port Range = 8080, Source is anywhere (0.0.0.0/0, ::/0). It is recommended to restrict this to just those that need access to Flow.

Click "Review and Launch"

AWS will suggest this server not be booted from a magnetic volume. Since there is not a lot of IO on the root partition and reboots are will be rare, choosing "Continue with Magnetic" will reduce costs. Choosing an SSD volume will not provide substantial benefit but it it OK if one wishes to use an SSD volume.

Click Launch

Create a new keypair : Name it Flow-Key, download it, the run chmod 600 on the downloaded key so it can be used. Backup this key as you may lose access to the Flow instance without it.

The new instance will now boot. Use the left navigation bar and click on Instances. Click the pencil icon and assign the instance the name “Flow Server”

Next, we will need to make the server accessible at a fixed address. To do this, click on "Elastic IPs" on the left. 

Click "Allocate new address"

Assign Scope to VPC

Click Allocate

On the table containing the newly allocated elastic IP, right click and select "Associate Address"

For Instance select the instance name “Flow Test Server”

For Private IP, select the one private IP available, then click Associate

For the remaining steps, we refer to the elastic ip as "elastic.ip"

 

SSH to the new Flow-Server instance:

chmod 600 Flow-Key.pem

ssh -i Flow-Testing.pem ubuntu@elastic.ip

 

Attach, format, and move the ubuntu home directory into the large ST1 EBS volume. All Flow data will live in this volume. Consult the AWS EC2 documentation for further information about attaching EBS volumes to your instance.

sudo su

mkfs -t ext4 /dev/xvdb (Under Volumes in the EC2 console, inspect "Attachment information". It will likely list this volume as attached to /dev/sdb. Replace "s" with "xv" to find the device name to use for this mkfs command)

Make a note of the newly created UUID for this volume

Copy the ubuntu home directory onto the EBS volume using a temporary mount point:

mount -t ext4 /dev/xvdb /mnt/

rsync -avr /home/ /mnt/

umount /mnt/

Make the EBS volume mount at system boot:

Add the following to /etc/fstab: UUID=(the UUID from the mkfs command above) /home   ext4    defaults,nofail        0       2

mount -a

Disconnect the ssh session, then log in again to make sure all is well

 

Install Flow:

Before you begin, send your MAC address to licensing@partek.com. The output of ifconfig will suffice. Given this information, we will create a license for your AWS server. MAC addresses will remain the same if you stop and start your Flow EC2 instance. If you find the MAC address does change, let our licensing department know and we can add your license to our floating license server.

Install the following required packages:

sudo apt-get update

sudo apt-get install  python perl make gcc g++ zlib1g libbz2-1.0 libstdc++6 libgcc1 libncurses5 libsqlite3-0 libfreetype6 libpng12-0 zip unzip libgomp1 libxrender1 libxtst6 libxi6 debconf

Now Flow: (see generic install instructions as well)

Make sure you are running as the ubuntu user.

cd (we will install Flow to ubuntu's home directory)

wget --content-disposition packages.partek.com/linux/flow-release

unzip PartekFlow*.zip

./partek_flow/start_flow.sh

Flow has finished loading when you see "INFO: Server startup in xxxxxxx ms" in the partek_flow/logs/catalina.out log file. This takes ~30 seconds.

Open Flow with a browser: http://elastic.ip:8080/

Enter license key

Set up Flow admin account

Leave the library file directory at its default location and check that free space listed for this directory matches what was allocated for the ST1 EBS volume.

Done! You are ready to use Flow.

 

Support:

After the EC2 instance is provisioned, we are happy to assist with setting up Flow or other issues you encounter with the usage of Flow. The quickest way is to allow us remote access to your server by sending us Flow-Key.pem and amending the SSH rule for Flow-SG to include access from IP 97.84.41.194 (Partek HQ). We recommend sending us Flow-Key.pem via secure means. The easiest way to do this is with the following command:

curl -F "file=@FlowKey.pem" https://installfeedback.partek.com/fupload

We also provide live assistance via GoTo meeting or TeamViewer if you are uncomfortable with us accessing your EC2 instance directly. Before contacting us, please run ./partek_flow/flowstatus.sh to send us logs and other information that will assist us with your support request.

 

General recommendations:

With newer EC2 instance types, it is possible to change the instance type of an already deployed Flow EC2 server. We recommend doing several rounds of benchmarks with production-sized workloads and evaluate if the resources allocated to your Flow server are sufficient. You may find that reducing resources allocated to the Flow server may come with significant cost savings, but can cause UI responsiveness and job run-times to reach unacceptable levels. Once you have found an instance type that works, you may wish to use "reserved instance" pricing which is significantly cheaper than dynamic instance pricing. Reserved instances come with 1 or 3 year usage terms. Please see the EC2 Reserved Instance Marketplace to sell or purchase existing reserved instances at reduced rates. 

The network performance of the EC2 instance type becomes an important factor if your primary usage of Flow is for alignment. For this use case, one will have to move copious amounts of data back (input fastq files) and forth (output bam files) between the Flow server and the end users, thus it is important to have as what AWS refers to as "High network performance" which for most cases is around 1 Gb/s. If focus is primarily on downstream analysis and visualization (e.g. the primary input files are ADAT) then network performance is less of a concern.

We recommend HVM virtualization as we have not seen any performance impact from using them and non-HVM instance types can come with significant deployment barriers.

Make sure your instance is "EBS optimized" by default and you are not charged a surcharge for EBS optimization.

"T-class" servers, although cheap, may slow responsiveness for the Flow server and generally do not provide sufficient resources.

We do not recommend placing any data on "instance store" volumes since all data is lost on those volumes after an instance stops. This is too risky as there are cases where user tasks can take up unexpected amounts of memory forcing a server stop/reboot.

AWS Instance Type Resources and Costs

The values below were updated April 2017. The latest pricing and EC2 resource offerings can be found at http://www.ec2instances.info

Instance TypeMemoryCoresEBS throughputNetwork PerformanceMonthly cost
m4.large 8.0 GB2 vCPUs56.25 MB/s  MMedium$78.840
r4.large15.25 GB2 vCPUs 50 MB/s  H(10G int) High (+10G interface)$97.09
m4.xlarge16.0 GB4 vCPUs93.75 MB/s  HHigh$156.950
r4.xlarge30.5 GB4 vCPUs100 MB/s     HHigh$194.180
m4.2xlarge32.0 GB8 vCPUs 125 MB/s     HHigh$314.630
r4.2xlarge61.0 GB8 vCPUs200 MB/s     H(10G int)High (+10G interface)$388.360

Single server recommendation: m4.xlarge or m4.2xlarge

Network performance values for US-EAST-1 correspond to: Low ~ 50Mb/s, Medium ~ 300Mb/s, High ~ 1Gb/s.

 

EBS volumes:

Volume type:

This is dependent on the type of workload. For must users, the Flow server tasks will be alignment-heavy so we recommend a throughput optimized HDD (ST1) EBS volume since most aligner operations are sequential in nature. For workloads that focus primarily on downstream analysis, a general purpose SSD volume will suffice but the costs are greater. For those who focus on alignment or host several users the storage requirements can be high. ST1 EBS volumes have the following characteristics:

Max throughput 500 MiB/s

$0.045 per GB-month of provisioned storage ($22.5 per month for a 500 GB of storage).

Note that EBS volumes can be grown or performance characteristics changed. To minimize costs, start with a smaller EBS volume allocation of 0.5 - 2 TB as most mature Flow installations generate roughly this amount of data. When necessary, the EBS volume and the underlying file system can be grown on-line (making ext4 a good choice). Shrinking is also possible, but may require the Flow server to be off-line.


 

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 1 rates

  • No labels