Aligners

Next generation sequencing can produce anywhere from hundreds of thousands to tens of millions short nucleotide sequences for a single sample. For any given base within an individual sequence there can also be a quality score associated with the confidence of that base call from the sequencer. The process of alignment is used to map all of these reads to a reference sequence, providing information with regards to the start and stop positions of each read within the reference sequence as well as a quality metric for the mapping. These aligned reads can then be utilized for downstream tasks such as variant detection or expression analysis. This document will provide information about the available aligners within Partek^® Flow^® as well as illustrate how to perform alignment against a reference sequence.

The user should be familiar with:

Showing Aligners

In order to show the Pre-alignment tools, select an Unaligned reads or Trimmed reads data node. They will appear on the context-sensitive menu on the right of the screen (Figure 1).

Figure 1. Showing Aligners from a trimmed reads node

Aligners

Partek^® Flow^® provides numerous publicly available tools for the alignment process to meet the needs of your specific sequencing experiment. The information below provides a synopsis of each aligner as well as the current version. Please refer to the references section for further information on each aligner.

Bowtie¹ (Version 1.0.0) - Uses a Burrows-Wheeler transform to create a permanent, reusable index of the genome. Backtracking is used to conduct a quality-aware, greedy, randomized, depth-first search of all possible alignments based on the specified alignment parameters. Does not handle gapped alignments. Fast, memory efficient, and accurate for short reads of high quality (<50bp). Popular for short DNA-Seq reads and small RNA-Seq reads. (http://bowtie-bio.sourceforge.net/index.shtml)

Bowtie 2² (Version 2.2.5) - Uses a Burrows-Wheeler transform to create a permanent, reusable index of the genome. Alignment involves mapping seed sequences in an ungapped fashion and then performing a gapped extension. Supports a local alignment mode that "soft clips" alignments which do not align end-to-end. Unlike Bowtie, handles gapped alignments, ambiguous bases (N’s), and paired reads that do not align in a paired fashion. Fast, memory efficient, and accurate for longer reads (>50bp) with no upper limit on read length. Popular for DNA-seq reads and small RNA-Seq reads. (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)

BWA^3,4 (Version 0.7.12) - Uses a Burrows-Wheeler transform to create an index of the genome. Handles gapped alignments and ambiguous bases (N’s). BWA-backtrack uses a backward search may be optimal for short reads (>70bp). BWA-MEM typically fastest and most accurate for longer reads, although BWA-SW may have better sensitivity when gapped alignments. Popular for DNA-seq variant calling pipelines, but not for RNA-seq as splicing is not taken into account. (http://bio-bwa.sourceforge.net/)

GSNAP⁵ (Version 2015-12-31(v8)) - A short read aligner (>14bp) using a successive constrained search, capable of handling splicing using either a probabilistic model or database. Built to handle SNPs in alignment. Good sensitivity but slower speed and higher memory usage. Popular for RNA-seq analysis.

Isaac 2⁶ (Version 15.07.16) -

STAR⁷ (Version 2.4.1d) - Splice-aware aligner that utilizes novel sequential maximal mappable seed search capable of handling splice junctions. Seeds are subsequently stitched together by local alignment. Capable of handling long reads. Good speed and sensitivity for RNA-seq analysis but with high memory usage.

TMAP⁸ (Version 5.0.0) - Integrates a set of aligners to (including modified BWA) to identify candidate mapping locations and performs alignment using Smith-Waterman algorithm. TMAP is optimized to handle variable length reads and error profiles generated by Ion Torrent data.

TopHat⁹ (Version 1.4.1) - Two stage aligner that first ulitizes Bowtie(2) to map to a reference and subsequently unaligned reads are are mapped to a database of possible splice junctions. Popular for RNAseq analysis with solid performance, speed, and memory usage.

TopHat 2¹⁰ (Version 2.1.0) -

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating:

Results:

0

rates

Partek Flow Documentation

Page tree

Aligners

Additional Assistance