Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Text file format: it is a .txt file, you can open the text file in any text editor or Microsoft Excel, each row is a transcript, each column is a sample.

Image Removed 

Numbered figure captions
SubtitleTextDownload quantification output data dialog: data can be downloaded in two types of format: Partek Genomics Suite project format or text file format
AnchorNameDownload data

 

 

Quantify to transcriptome (Cufflinks)

Cufflinks assembles transcripts and estimates transcript abundances on aligned reads. Implementation details are explained in Trapnell et al. [1]

The Cufflinks task has three options that can be configured (Figure 11):

Numbered figure captions
SubtitleTextCufflinks configuration dialog
AnchorNamecufflinks

Image Removed

  • Novel transcript: this option does not require any annotation reference, it will do de novo assembly to reconstruct transcripts and estimate their abundance
  • Annotation transcript: this option requires an annotation model to quantify the aligned reads to known transcripts based on the annotation file.
  • Novel transcript with annotation as guide: this option requires an annotation file to quantify the aligned reads to known transcripts as well as assemble aligned reads to novel transcripts. The results include all transcripts in the annotation file plus any novel transcripts that are assembled.

When the Use bias correction check box is selected, it will use the genome sequence information to look for overrepresented sequences and improve the accuracy of transcript abundance estimates.

Quantify to reference (Partek E/M)

This task does not need an annotation model file, since the annotation is retrieved from the BAM file itself. The sequence names in the BAM files constitute the features with which the reads are quantified against.

This task is generally performed on reads aligned to a transcriptome, e.g when a species does not have a genome reference, and the bam files contain transcriptome information. In this case, the features for this quantification task are the reference sequence names in the input bam files.

There are two parameters in Quantify to reference (Figure 12):

 

Numbered figure captions
SubtitleTextQuantify to reference dialog
AnchorNamequant-reference

Image Removed

  • Min coverage: will filter out any features (sequence names) that have fewer reads across all samples than the value specified
  • Strict paired-end compatibility: this only affects paired end data. When it is checked, only reads that have two ends aligned to the same feature will be counted. Otherwise, reads will still be counted as exonic compatible reads even if the mate is not compatible with the feature

During quantification:

  1. We scan through each of the BAM files and find all the transcripts that meet the minimum coverage threshold.
  2. With those transcripts, we "create" an annotation file that has the transcript name as the sequence name and the Gene ID and the Transcript ID have the same transcript name. The start position is 1 and the end position is the length of the transcript. 
  3. Effectively, what the annotation file does is filter out the low coverage transcripts.
  4. Since we don't know where the transcripts are in the genome, chromosome view will display only one transcript at a time (i.e., the transcript names are treated like "chromosomes").

The output data node will display a similar Task report as the Quantify to annotation model task.

Image Added

 

References

  1. Trapnell C, Williams B, Pertea G, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotech. 2010; 28:511-515.
  2. Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C. An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res. 2006; 34(10):3150-60.


...