Partek Flow Documentation

Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Cufflinks assembles transcripts and estimates transcript abundances on aligned reads. Implementation details are explained in Trapnell et al. [1]

The Cufflinks task has three options that can be configured (Figure 11):


Figure 1. Cufflinks configuration dialog
  • Novel transcript: this option does not require any annotation reference, it will do de novo assembly to reconstruct transcripts and estimate their abundance
  • Annotation transcript: this option requires an annotation model to quantify the aligned reads to known transcripts based on the annotation file.
  • Novel transcript with annotation as guide: this option requires an annotation file to quantify the aligned reads to known transcripts as well as assemble aligned reads to novel transcripts. The results include all transcripts in the annotation file plus any novel transcripts that are assembled.

When the Use bias correction check box is selected, it will use the genome sequence information to look for overrepresented sequences and improve the accuracy of transcript abundance estimates.

Quantify to reference (Partek E/M)

This task does not need an annotation model file, since the annotation is retrieved from the BAM file itself. The sequence names in the BAM files constitute the features with which the reads are quantified against.

This task is generally performed on reads aligned to a transcriptome, e.g when a species does not have a genome reference, and the bam files contain transcriptome information. In this case, the features for this quantification task are the reference sequence names in the input bam files.

There are two parameters in Quantify to reference (Figure 12):

 

Figure 2. Quantify to reference dialog
  • Min coverage: will filter out any features (sequence names) that have fewer reads across all samples than the value specified
  • Strict paired-end compatibility: this only affects paired end data. When it is checked, only reads that have two ends aligned to the same feature will be counted. Otherwise, reads will still be counted as exonic compatible reads even if the mate is not compatible with the feature

During quantification:

  1. We scan through each of the BAM files and find all the transcripts that meet the minimum coverage threshold.
  2. With those transcripts, we "create" an annotation file that has the transcript name as the sequence name and the Gene ID and the Transcript ID have the same transcript name. The start position is 1 and the end position is the length of the transcript. 
  3. Effectively, what the annotation file does is filter out the low coverage transcripts.
  4. Since we don't know where the transcripts are in the genome, chromosome view will display only one transcript at a time (i.e., the transcript names are treated like "chromosomes").

The output data node will display a similar Task report as the Quantify to annotation model task.

References

  1. Trapnell C, Williams B, Pertea G, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotech. 2010; 28:511-515.
  2. Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C. An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res. 2006; 34(10):3150-60.


Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.


Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 23 rates

  • No labels