Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Attach, format, and move the ubuntu home directory into the large ST1 EBS volume. All Flow data will live in this volume. Consult the AWS EC2 documentation for further information about attaching EBS volumes to your instance.

sudo su

mkfs -t ext4 /dev/xvdb (Under Volumes in the EC2 console, inspect "Attachment information". It will likely list this volume as attached to /dev/sdb. Replace "s" with "xv" to find the device name to use for this mkfs command)

...

With newer EC2 instance types, it is possible to change the instance type of an already deployed Flow EC2 server. We recommend doing several rounds of benchmarks with production-sized workloads and evaluate if the resources allocated to your Flow server are sufficient. You may find that reducing resources allocated to the Flow server may come with significant cost savings, but may can cause UI responsiveness and job run-times to reach unacceptable levels. Once you have found an instance type that works, you may wish to use "reserved instance" pricing which is significantly cheaper than dynamic instance pricing. Reserved instances come with 1 or 3 year usage terms. Please see the EC2 Reserved Instance Marketplace to  to sell or purchase existing reserved instances at reduced rates. 

The network performance of the EC2 instance type becomes an important factor if your primary usage of Flow is for alignment. For this use case, one will have to move copious amounts of data back (input fastq files) and forth (output bam files) between the Flow server and the end users, thus it is important to have as what AWS refers to as "High network performance" which for most cases is around 1 Gb/s. If focus is primarialy primarily on downstream analysis and visualization (e.g. the primary input files are ADAT) then network performance is less of a concern.

We recommend HVM vitualization virtualization as we have not seen any performance impact from using them and non-HVM instance types can come with significant deployment barriers.

Make sure your instance is "EBS optimized" by default and you are not charged a surcharge for EBS optimization.

"T-class" servers, although cheap, may slow responsiveness for the Flow server and generally do not provide sufficient resources.

We do not recommend placing any data on "instance store" volumes since all data is lost on that volume those volumes after an instance stops. This is too risky as their there are cases where user tasks can take up unexpected amounts of memory forcing a server stop/reboot.

...

Latest (April 2017) EC2 instance costs:

Pricing The latest pricing and resource table:EC2 resource offerings can be found at http://www.ec2instances.info/


Instance TypeMemMemoryCoresEBS throughput (netw rate)Network PerformanceMonthly cost
m4.large 8.0 GB2 vCPUs56.25 MB/s  MMedium$78.840
r4.large15.25 GB2 vCPUs 50 MB/s  H(10G int) High (+10G interface)$97.09
*m4.xlarge16.0 GB4 vCPUs93.75 MB/s  HHigh$156.950
r4.xlarge30.5 GB4 vCPUs100 MB/s     HHigh$194.180
*m4.2xlarge32.0 GB8 vCPUs 125 MB/s     H $314High$314.630
r4.2xlarge61.0 GB8 vCPUs200 MB/s     H(10G int)High (+10G interface)$388.360

Single server recommendation: m4.2xlarge

Cluster head node recommendation: m4.xlarge

Soma “test” server with worst case hardware: m4.large

 

Performance may suffer if one chooses smaller nodes.

Once can always shut down and change the instance types if a particular choice is insufficient. EBS volumes can be grown or performance changed.

 

Network speed (netw rate) xlarge or m4.2xlarge

Network performance values for US-EAST-1 internal and external.L = correspond to: Low ~ 50Mb/sM = , Medium ~ 300Mb/sH = , High ~ 1Gb/s.

 

See network benchmarks: http://epamcloud.blogspot.com.br/2013/03/testing-amazon-ec2-network-speed.html

 

EBS types:

Use throughput optimized HDD for flow data.

EBS volumes:

Volume type:

This is dependent on the type of workload. For must users, the Flow server tasks will be alignment-heavy so we recommend a throughput optimized HDD (ST1) EBS volume since most aligner operations are sequential in nature. For workloads that focus primarily on downstream analysis, a general purpose SSD volume will suffice but the costs are greater. For those who focus on alignment or host several users the storage requirements can be high. ST1 EBS volumes have the following characteristics:

Max throughput 500 MiB/s

$0.045 per GB-month of provisioned storage

...

($22.5 per month for a 500 GB

...

of storage).

Notes about EBS volumes:

500 (This is the minimum for st1 volumes) (See: Notes about EBS volumes) Throughput optimized HDD, throughput =  20 / 123 (can’t change) Baseline: 40 MB/s per TiB, no delete on terminate or encrypt

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html

Single-Node install : change pref to beefy server

Multi-node: less-beefy head node

 

Other Notes:

 

ECU Vs. vCPU:

each vCPU is a hyperthread of an Intel Xeon core

1 ECU is the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor

 

ADD: placement group

 

paravirtualization: faster because not fully virtualized, but major drawback: You need a region-specific kernel object for each Linux instance. So just stick with hvm for ease of deployment. Hvm has increased in performance.

 

spreadsheet: removed anything that was not EBS optimized. removed GPU nodes, remove micro, nano, small instances

don't care about gpu support

remove < 4 GB memory

 

Linus grow filesystem:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html#recognize-expanded-volume-linuxNote that EBS volumes can be grown or performance characteristics changed. To minimize costs, start with a smaller EBS volume allocation of 0.5 - 2 TB as most mature Flow installations generate roughly this amount of data. When necessary, the EBS volume and the underlying file system can be grown on-line (making ext4 a good choice). Shrinking is also possible, but may require the Flow server to be off-line.


 

Additional assistance

 

Rate Macro
allowUsersfalse