Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Numbered figure captions
SubtitleTextFusion report of Partek fusion gene detection algorithm. Each row represents a fusion gene candidate (an example is shown)
AnchorNameFusion report

TopHat-Fusion Algorithm

General Overview

TopHat-Fusion is a version of TopHat (see Chapter 6.1) with the ability to align reads across fusion points and detect fusion genes resulting from breakage and re-joining of two different chromosomes or from rearrangements within a chromosome (3). It is independent of gene annotation and can discover fusion products from known genes, unannotated splice variants of known genes or completely unknown genes.

The reads are first aligned to the genome and initially, unaligned reads are then split into multiple 25 bp sequences which are, in turn, aligned to the genome by Bowtie. TopHat-Fusion algorithm then identifies the cases where the first and the last 25 bp segment are aligned to either two different chromosomes or two locations on the same chromosome (spacing is defined by the user). The whole read is then used to identify a fusion point. After the initial fusion candidates are defined, all the segments from initially unaligned reads are realigned against the fusion points (as well as intron boundaries and indels) and the resulting alignments are combined to full read alignments.

The most up to date TopHat-Fusion version implemented in Partek Flow when the manual was written (2.0.8) focuses on fusions due to chromosomal rearrangements, while fusions resulting from read-through transcription or trans-splicing were not supported. TopHat-Fusion can handle both paired- and single-end reads, but the support of color-space reads is still pending. For details as well as discussion of TopHat-Fusion options, see TopHat-Fusion home page (4).

Running TopHat-Fusion within Partek Flow

TopHat-Fusion is integrated with TopHat 2 and fusion detection is activated by using the Fusion search check box in the TopHat 2 Alignment options dialog (Figure 6).

 

Numbered figure captions
SubtitleTextActivating TopHat-Fusion algorithm for detection of fusion genes (“Fusion gene” project shown as an example)
AnchorNameTopHat-Fusion algorithm activation

Image Added

The output is associated with the Fusion results data node (Figure 7), which is a part of TopHat 2 results (in addition to Aligned reads node and, optionally, Unaligned reads node).

 

Numbered figure captions
SubtitleTextFusion results node as a result of TopHat-Fusion algorithm
AnchorNameFusion results node

Image Added

Selecting the Fusion results node opens the toolbox, with Variant detection options (Figure 8).

 

Numbered figure captions
SubtitleTextVariant detection options invokable on TopHat-Fusion results
AnchorNameVariant detection options

Image Added

Fusion report displays an annotated report on detected fusion genes. For that purpose an annotation file needs to be specified first (Figure 9).

 

Numbered figure captions
SubtitleTextSelecting an annotation file to annotate TopHat-Fusion results (“Fusion gene” project shown as an example)
AnchorNameAnnotation file selection

Image Added

The result of annotation is the Fusion report task node as seen in Figure 10.

 

Numbered figure captions
SubtitleTextFusion report task node as a result of annotating Fusion results generated by TopHat-Fusion algorithm
AnchorNameFusion report task node

Image Added

The list of annotated fusion genes, in a form of Fusion report (Figure 11), can be obtained by first selecting the Fusion report task node and then the Task report link from the toolbox. Each row of the table in Figure 11 is a potential fusion event, with the columns providing the following information.

  • Sample ID: sample in which the fusion event was identified;
  • Score: fusion score as defined in the original TopHat-Fusion report (3);
  • Type1: genomic section of the left-hand part of the fusion;
  • Gene1: gene on the left side of the fusion;
  • Transcript1: affected transcript of the Gene1;
  • Type2: genomic section of the right-hand part of the fusion;
  • Gene2: gene on the right side of the fusion;
  • Transcript2: affected transcript of the Gene2
  • Loci: coordinates of the fusion event (a dash indicates genes on different chromosomes, while a colon indicates that both genes are on the same chromosome with the distance between the parts being given after the colon);
  • Strands: orientation of the two chromosomes (e.g. ff indicates that both chromosomes are in forwarding direction);
  • Spanning reads: the number of reads spanning the fusion.

All the columns can be sorted by using the arrow buttons () in column headers.

 

Numbered figure captions
SubtitleTextFusion report of TopHat-Fusion fusion gene detection algorithm. Each row represents a fusion gene candidate (an example is shown)
AnchorNameFusion report

Image Added

Moreover, Fusion attribute report, when invoked from the Fusion results node, displays a report on attributes of detected fusion genes. Attributes to be tested for association with the fusion should be specified first (Figure 12).

 

Numbered figure captions
SubtitleTextSelecting attributes to be tested for association with fusion events (“ER Status” shown as an example)
AnchorNameAttribute selection

Image Added

A new data node, Fusion attribute report, is generated in the Analysis tab (Figure 13) and it provides access to the Task report link in the toolbox.

 

Numbered figure captions
SubtitleTextFusion attribute report node as a result of annotating Fusion results generated by TopHat-Fusion algorithm
AnchorNameFusion attribute report

Image Added

The output, Fusion report table (Figure 14) resembles the basic TopHat-Fusion output (Figure 11); each row of the table is a single fusion event and three right-most columns are as follows:

  • p-value: p-value for the chi-squared test comparing the observed number of counts against across the levels of the attribute specified in the setup;
  • % in (attribute level): fraction of reads detected within the samples belonging to the specified level of the attribute (each level is presented as a single column).

 

Numbered figure captions
SubtitleTextFusion attribute report of TopHat-Fusion fusion gene detection algorithm. Each row represents a fusion gene candidate (the example shows comparison of number of fusion reads detected in ER– group vs. ER+ group, with the p-value based on χ2 test)
AnchorNameFusion attribute report

Image Added

STAR Algorithm

General Overview

STAR aligner (see Chapter 6.1) also has the ability to detect fusion genes (referred to as “chimeric alignments”) (5). During the first phase of alignment, STAR searches for maximal mappable prefixes (seeds) of sequencing reads. In the second phase, all the seeds that align within user-defined genomic windows are stitched together. If an alignment within one genomic window does not cover the entire read sequence, STAR will try to find two or more windows that cover the entire read. This essentially results in detection of fusion events, with different parts of reads aligning to distal genomic locations, or different chromosomes, or different strands.

The most up to date STAR version implemented in Partek Flow when the manual was written (2.3.0) aligns both paired- and single-end reads. Color-space reads are not supported.

Running STAR Chimeric Alignment within Partek Flow

STAR fusion detection algorithm is integrated with STAR aligner and fusion detection is activated by tick-marking Chimeric alignment option in the Advanced options of the aligner (the Advanced options dialog is reached via Configure link in the setup dialog). As soon as the Chimeric alignment is selected, additional options, specific to the fusion search algorithm, are shown (Figure 15). For discussion on the options details, see STAR documentation.

 

numbered-figure-captions
SubtitleTextControls of the STAR fusion gene detection algorithm (aligner defaults are shown)
AnchorNameSTAR controls

Image Added

The output is associated with the Chimeric results data node (Figure 16), which is a part of STAR results (in addition to Aligned reads node and, optionally, Unaligned reads node).

 

Numbered figure captions
SubtitleTextChimeric results node as a result of STAR’s chimeric alignment algorithm
AnchorNameChimeric results node

Image Added

Selecting the Chimeric results node opens the toolbox, with Variant detection options (Figure 17).

 

Numbered figure captions
SubtitleTextVariant detection options invokable on STAR’s chimeric alignment results
AnchorNameVariant detection options

Image Added

Fusion report displays an annotated report on detected fusion genes. For that purpose an annotation file needs to be specified first (Figure 18).

 

Numbered figure captions
SubtitleTextSelecting an annotation file to annotate STAR’s chimeric alignment results (“Fusion gene” project shown as an example)
AnchorNameAnnotation file selection

Image Added

The result of annotation is the Fusion report task node as seen in Figure 19.

 

Numbered figure captions
SubtitleTextFusion report task node as a result of annotating Chimeric results generated by STAR’s chimeric alignment
AnchorNameFusion report task node

Image Added

The list of annotated fusion genes, in a form of Fusion report (Figure 20), can be obtained by first selecting the Fusion report task node and then the Task report link from the toolbox. Each row of the table in Figure 20 is a potential fusion event, with the columns providing the following information.

  • Sample ID: sample in which the fusion event was identified;
  • Score: not applicable;
  • Type1: genomic section of the left-hand part of the fusion;
  • Gene1: gene on the left side of the fusion;
  • Transcript1: affected transcript of the Gene1;
  • Type2: genomic section of the right-hand part of the fusion;
  • Gene2: gene on the right side of the fusion;
  • Transcript2: affected transcript of the Gene2
  • Loci: coordinates of the fusion event (a dash indicates genes on different chromosomes, while a colon indicates that both genes are on the same chromosome with the distance between the parts being given after the colon);
  • Strands: orientation of the two chromosomes (e.g. ff indicates that both chromosomes are in forwarding direction);
  • Spanning reads: the number of reads spanning the fusion.

All the columns can be sorted by using the arrow buttons () in column headers.

 

 

Figure 20: Fusion report of STAR’s chimeric alignment fusion gene detection algorithm. Each row represents a fusion gene candidate (an example is shown; Score is not applicable to STAR)

 

Moreover, Fusion attribute report, when invoked from the Chimeric results node, displays a report on attributes of detected fusion genes. Attributes to be tested for association with the fusion should be specified (Figure 21).

 

 

Figure 21: Selecting attributes to be tested for association with fusion events (“ER Status” shown as an example)

 

A new data node, Fusion attribute report, is generated in the Analysis tab (Figure 22) and it provides access to the Task report link in the toolbox.

 

 

Figure 22: Fusion attribute report node as a result of annotating Chimeric results generated by STAR’s chimeric alignment algorithm

 

The output, Fusion report table (Figure 23) resembles the basic TopHat-Fusion output (Figure 11); each row of the table is a single fusion events and three right-most columns are as follows:

  • p-value: p-value for the chi-squared test comparing the observed number of counts against across the levels of the attribute specified in the setup;
  • % in (attribute level): fraction of reads detected within the samples belonging to the specified level of the attribute (each level is presented as a single column).

 

 

Figure 23: Fusion attribute report of STAR’s chimeric alignment algorithm. Each row represents a fusion gene candidate (the example shows comparison of number of fusion reads detected in ER– group vs. ER+ group, with the p-value based on χ2 test)

 

 

References1

  1. Annala MJ, Parker BC, Zhang W, Nykter M. Fusion genes and their discovery using high throughput sequencing. Cancer Lett. 2013;340:192-200.
    1. Costa V, Aprile M, Esposito R, Ciccodicola A. RNA-Seq and human complex diseases: recent accomplishments and future perspectives. Eur J Hum Genet. 2013;21:134-142.
    2. Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biology. 2011;12:R72
    3. TopHat-Fusion. An algorithm for discovery of novel fusion transcripts. http:// http://tophat.cbcb.umd.edu/fusion_index.html Accessed on April 25, 2014
    4. Dobin A, Davies CA, Schlesinger F et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15-21.

 

 

additional-assistance

 

 

Page Turner

 

Rate Macro
allowUsersfalse