Page History

...

Let M be a motif of length L consisting of N motif instances.Let A be a 4XL alignment matrix such that a_i,jis the count of letter i at position j. Let B_i be the background frequency of letter i (calculated as the number of nucleotides i in the regions divided by the total oligonucleotides in the regions). Let S be a sequence of length L. The score of S given the alignment matrix is

Equation 1

Let h be the maximum of L_A. The quality score of a sequence is calculated as Q_A(S) =L_A(S)/h. A quality score of 1 corresponds to a sequence with the most likely base at each position of the alignment matrix. User will specify a threshold Q_A. all sequences that have a score T_A>Q_A *h will be reported.

The report contains summary and detail tabs:

Numbered figure captions

SubtitleText	Summary page of the result of known motif search
AnchorName	know_motif_summary

In the summary table (Figure 2), each row is a motif in the search database specified. Click on the motif name will take to the database page to view more detailed information about the motif.

The probability P_Expected of a sequence having a score above T_A is calculated under the assumption that the base are i.i.d. according to the background distribution B. Let N_Trialsbe the number of sequences compared to the alignment matrix. The expected number of occurrences of the motif in the regions is P_Expected * N_Trials. The p-value of observing N_Actual instances with a score above T_A is calculated based on the binomial distribution, where NT_rials is the number of trials and P_Expected is the probability of success. A low p-value indicates that the regions are enriched with instances of the motif.

The In the detail table, each row is a location containing a motif sequence with quality score (Figure 3).

Numbered figure captions

SubtitleText	Detail table of search for known motifs report
AnchorName	known_motif_detail

Image Added

Detect de novo motifs

Click on Peak data node, select Detect de novo motifs from Motif detection section in the pop-up menu (Figure 14), specify the number of motifs to detect and the length of the motifs.

Numbered figure captions

SubtitleText	Detect novo motif dialog
AnchorName	novo_motif

Motif discovery is done using Gibbs motif sampling. Partek Flow's implementation of the Gibbs motif sampling is based on Neuwal, et all [3]. the Gibbs sampling method is a stochastic procedure that attempts to find the subset of sequences within the regions that maximizes the log likelihood ratio (LLR)

Equation 2

This is done by repeating the below two steps until convergences:

...

The Gibbs sampler is run on a range of the motif sizes specified by the user. The motif with the greatest average LLR (LLR /length) is returned. To find N motifs ina set of regions, the Gibbs sampling method is run N times. The motif instances found from the previous run of the Gibbs sampler are removed before performing the next run.

The report contains summary and detail tabs:

In summary table, it contains motif consensus sequence and sequence logo (Figure 5):

Numbered figure captions

SubtitleText	Novo motif summary table
AnchorName	novo_motif_summary

Image Added

The detail table contains the locations of motif sequence (Figure 6):

Numbered figure captions

SubtitleText	Novo motif detail table
AnchorName	novo_motif_detail

Image Added

Reference

Hertz, GZ., & Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 1999, 15, 563-577
Schug, J., & Overton, C.G. TESS, Transcription Element Search Software on WWW. Technical Report CBIL-TR-1997-1001-v0.0, of the Computational Biology and Informatics Laboratory, School of Medicine, University of Pennsylvania, 1997
Rate Macro
allowUsers false
Neuwald, A.F., Liu, J.S., & Lawrence, C.E. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Science 1995, 4: 1618-1632.

...

Partek Flow Documentation

Page tree

Versions Compared

Old Version 7

New Version 8

Key

Detect de novo motifs

Reference