Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Motif detection identifies enriched sequence motifs in peak regions generated by ChIP-Seq and ATAC-Seq data.  Partek Flow includes Search for known motifs, which allows users to search for known motifs from a user-specified set or a database, and Detect de novo motifs, which can identify novel motifs. These tasks can be invoked on data nodes with genomic regions as features (not genes or transcripts).

Search for known motifs

Given a set of genomic regions, Search for known motifs can search for enrichment based on a sequence provided by user or using a sequence database like JASPAR. 

...

The configuration dialog offers two search methods, By sequence and By database. 

By sequence

By sequence has two options (Figure 1):

...

From file: use can specify a text file (.txt) that contains a list of sequences, one row per sequence.

 


Numbered figure captions
SubtitleTextSearch for known motifs by specifying a sequence manually
AnchorNamesearch_motif_seq

The By sequence option uses a string search tool to return all positions in the set of genomic regions that match the given string(s). The string match is case insensitive, meaning if you search for ATCG, you may get atcG as a match. Nucleotides that are lower case have been "repeat masked", meaning they are located in a repetitive region of the genome. Your search string may contain any of the characters from the IUPAC nucleotide code . For example, if you search for WAAA, you  may get back AAAA or TAAA (or any variation of upper and lower cases), because W represents A or T.

 By database

The By database option uses an alignment matrix to match sequences against a motif database. We distribute the JASPAR database, but you can add any custom or public motif database. 

...

Let h be the maximum of LA. The quality score of a sequence is calculated as QA(S) =LA(S)/h.  A quality score of 1 corresponds to a sequence with the most likely base at each position of the alignment matrix. The user will specify a threshold QA. All sequences that have a score TA>QA *h will be reported.

Task report

The Search for known motifs task report contains summary and detail tabs.

 


Numbered figure captions
SubtitleTextSearch for known motif task report summary tab
AnchorNameknow_motif_summary

...

The detail tab lists motif sequence locations on rows and includes the quality score for each instance (Figure 3). 

 


Numbered figure captions
SubtitleTextSearch for known motifs task report detail tab
AnchorNameknown_motif_detail

Detect de novo motifs

Detect de novo motifs can be used to identify novel motifs that are enriched in the input regions. 

...

The Gibbs sampler is run on a range of the motif sizes specified by the user. The motif with the greatest average LLR (LLR /length) is returned. To find N motifs in a set of regions, the Gibbs sampling method is run N times. The motif instances found from the previous run of the Gibbs sampler are removed before performing the next run.

Task report

The task report includes summary and detail tabs. 

The summary tab gives the consensus sequence and sequence logo for each detected motif (Figure 5). 

 


Numbered figure captions
SubtitleTextDetect de novo motifs task report summary tab
AnchorNamenovo_motif_summary

The detail tab is similar to the detail tab in Search for known motifs and lists the location of each motif sequence (Figure 6).

 


Numbered figure captions
SubtitleTextDetect de novo motifs task report detail tab
AnchorNamenovo_motif_detail

References

  1. Hertz, GZ., & Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 1999, 15, 563-577
  2. Schug, J., & Overton, C.G. TESS, Transcription Element Search Software on WWW. Technical Report CBIL-TR-1997-1001-v0.0, of the Computational Biology and Informatics Laboratory, School of Medicine, University of Pennsylvania, 1997
  3. Neuwald, A.F., Liu, J.S., & Lawrence, C.E. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Science 1995, 4: 1618-1632.