A fusion gene is a hybrid gene that combines parts of two or more original genes. They can form as a result of chromosomal rearrangements (such as translocation, interstitial deletion, or chromosomal inversion) or abnormal transcription and have been shown to act as drivers of malignant transformation or/and progression in various neoplasms (1). The discovery and characterization of fusion genes hav been greatly facilitated by the use of NGS (2) and several computational algorithms have been developed to detect them.
This chapter covers will illustrate how to detect fusion genes by:
Partek Algorithm
General Overview
The Partek® Flow® fusion detection algorithm uses paired-end information to find pairs of genes that may express as a hybrid. A paired-end read is considered for a fusion event if:
- an alignment from the first-in-pair maps to a different sequence (chromosome) than an alignment from the second-in-pair, or;
- the distance between all alignments from the first-in-pair and the second-in-pair exceed a custom-defined threshold (default: 50 kb).
The algorithm then reports peaks of reads that are potentially involved in a fusion event. Adjacent peaks are merged if their distance is less than 200 bp (default) and the probability that the peak is derived from the null distribution of peaks (determined by permutation) is reported. False positives hits are reduced by ignoring alignments that overlap with regions masked in the .2bit file. Finally, the peaks are annotated with a transcript model and a report is generated for pairs of peaks which map to different transcripts.
Running Partek Fusion Gene Algorithm within Partek Flow
Partek algorithm can be invoked on a data node containing aligned paired-end reads (i.e. Aligned reads node), through the Detect fusion genes link in the Variant detection section of the toolbox (Figure 1).
First, the genome build that should be used for fusion gene detection needs to be specified (Figure 2).
The next dialog (Fusion options; Figure 3) allows for optimization of several parameters. Min distance between ends specifies the minimum distance (bp) between first in pair and second in pair reads to be considered for a fusion event, while Window gap (bp) defines the minimum distance that needs to be detected between two neighboring fusion candidates in order to label them as independent fusion events. The Annotation model is required to annotate the components of the fusion gene in the output table (see below).
As a result, a new data node (Fusion) will be created (Figure 4). Selecting the Fusion node opens the toolbox and the list of fusion genes can then be reached via the Task report link.
An example of the output, i.e. Fusion report, is shown in Figure 5. Each row of the table is a potential fusion event, with the columns providing the following information.
- Chromosome1: chromosome ID for the gene on the left side of the fusion;
- Start1: start position of the segment on the left;
- Stop1: stop position of the segment of the left;
- Chromosome 2: chromosome ID for the gene on the right side of the fusion;
- Start2: start position of the segment on the right;
- Stop2: stop position of the segment on the left;
- Sample ID: sample in which the fusion event was identified;
- Counts: number of supporting reads;
- p-value: p-value for the chi-squared test comparing the observed number of counts against the expected number (background distribution);
- Gene1: gene on the left side of the fusion;
- Gene2: gene on the right side of the fusion.
All the columns can be sorted by using the arrow buttons () in column headers.
Additional Assistance
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Your Rating: | Results: | 3 | rates |