Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Full expectation-maximization (EM) algorithm (3), as implemented in Partek® Flow®Partek® Flow®, with slight modifications.

...

The estimated abundance is derived from fragments per kilobase of transcript per million aligned reads (FPKM)/reads per kilobase of transcript per million aligned reads (RPKM) values delivered by the quantification algorithm. The concordance between known and estimated abundances is measured by the r2r2, a value ranging from 0% to 100%, where 100% corresponds to perfect estimation of abundance.

The studies of Roberts et al. (4) and Li et al. (10) measured the importance of sequence and position bias correction based on qPCR using a set of genes from Microarray Quality Control project (MAQC). Roberts and colleagues reported that, when both sequence and position bias are accounted for, the r2 goes r2 goes up by about 5%, from 75.3% to 80.7%. The impact of position bias is reported to be small relative to that of sequence bias.

Li and Dewey (10) showed that RSEM quantification package, that implements only the position bias correction, delivers the r2 of r2 of 69%. For comparison, running Cufflinks on the same dataset with the position bias correction obtained the r2 of r2 of only 71%. When both sequence and position bias corrections were enabled in Cufflinks, the r2 went r2 went up to 79%. We can conclude that, for real datasets, the impact of position bias is negligible, and the impact of sequence bias is about 5-8%.

...

Quantification was performed by Partek Flow with the EM algorithm and the plots of estimated vs. expected transcript abundance for Mix A and Mix B are in Figures 1 and 2 (respectively). As we can see, the r2 is r2 is about 97% and, hence, there is very little room for improvement in abundance estimation.

Furthermore, a fold change plot is given (Figure 3), comparing the estimated and expected fold change in Mix A vs. Mix B, for the four groups of transcripts with the known fold change values. The fold change values were calculated based on log2 log2 transformed RPKM values for each transcript (control).

...

To test for possible biases in abundance estimates, we combined the approaches of Li et al. (9) and Jiang et al. (12). That amounted to regressing the estimated abundance not only on the expected abundance, but also on the transcript length, GC content, and the expected fold change, subjecting all the variables to log2 log2 transformation.

We started with the full model containing four covariates and performed model selection based on two criteria: adjusted r2 (computed for all possible models) and stepwise regression (with a cutoff p-value of 0.15). The combined approach allowed us to consider both practical and statistical significance of the covariates.

...