Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Numbered figure captions
SubtitleText
AnchorNameGSA Attributes

Image Added

A) Choosing factors (attributes) in GSA

AnchorNameGSA Attributes

Image Modified

...

B) Choosing comparisons in GSA

 

Based on the set of user-specified factors and comparisons, GSA considers all possible designs with the following restrictions:

...

The model with the lowest information criterion is considered the best choice. It is possible to quantify the superiority of the best model by computing the so-called Akaike weight (Figure 3).The model's weight is interpreted as the probability that the model would be picked as the best if the study were reproduced. In the example above, we can obtain 15 Akaike weights that sum up to one. For instance, if the best model has Akaike weight of 0.95, then it is very superior to other candidates from the model pool. If, on the other hand, the best weight is 0.52, then the best model is likely to be replaced if the study were reproduced. We can still use this "best shot" model for downstream analysis, keeping in mind that that the accuracy of this "best shot" is fairly low.Figure 3:

Image Removed
Numbered figure captions
SubtitleTextFor each feature, Akaike weights and other statistics are available via "View extra details report" button
AnchorNameGSA Extra Details Report

Image Added

Obtaining reproducible results in GSA

...

When "Lognormal with shrinkage" is enabled, a separate shrinkage plot is displayed for each design (Figure 4). First, a lognormal linear model is fitted for each gene separately, and the standard deviations of residual errors are obtained (green dots in the plot). Applying shrinkage amounts to two more steps. We look at how the errors change depending on the average gene expression and we estimate the corresponding trend (black curve). Finally, the original error terms are adjusted (shrunk) towards the trend (red dots). The adjusted error terms are plugged back into the lognormal model to obtain the reported results such as p-values.value

Image Removed
Figure 4:  

Numbered figure captions
SubtitleTextShrinkage plot for a two group study with four observations per group. Blue arrows show how the error terms for transcripts ERCC-00046 and ERCC-00054 are adjusted up and down, correspondingly
AnchorNameShrinkage plot

Image Added

 

All other things being equal, the comparison p-value goes up as the magnitude of error term goes up, and vice-versa. As a result, the "shrunken" p-value goes up (down) if the error term is adjusted up (down). Table 1 reports some results for two features highlighted in Figure 4. Image Removed
Table 1:

 

Numbered figure captions
SubtitleTextAfter shrinkage is applied, p-values are adjusted in the same direction as the corresponding error terms.
AnchorNamep-value shrinkage table

Image Added

 

For a large sample size, the amount of shrinkage is small, (Figure 5), and the "Lognormal" and "Lognormal with shrinkage" p-values become virtually identical.

 

...

Numbered figure captions
SubtitleTextShrinkage plot for a two group study with about 40 samples per group. Thanks to a large sample size, the error terms have almost no adjustment (green and red dots almost coincide)
AnchorNameShrinkage plot

Image Added

 

One important usage of the shrinkage plot is a meaningful setting of low expression threshold in Low expression filter section (Figure 6). For features with low expression, the proportion of zero counts is high. Such features are less likely to be of interest in the study, and, in any case, they cannot be modeled well by a continuous distribution, such as Lognormal. Note that adding a positive offset to get rid of zeros does not help because that does not affect the error term of a lognormal model much. A high proportion of zeros can ultimately result in a drop in the trend in the leftmost part of the shrinkage plot (Figure 5).

A rule of thumb suggested by limma authors is to set the low expression threshold to get rid of the drop and to obtain a monotone decreasing trend in the left-hand part of the plot.Image Removed
Figure 6:

Numbered figure captions
SubtitleTextA meaningful value for "Lowest average coverage" threshold can be easily determined based on the shrinkage

...

AnchorNameLowest average coverage threshold

Image Added

 

For instance, in Figure 5 it looks like a threshold of 2 can get us what we want. Since the x axis is on the log2 scale, the corresponding value for "Lowest average coverage" is 22=4 (Figure 6). After we set the filter that way and rerun GSA, the shrinkage plots takes the required form (Figure 7). Image Removed
Figure 7:

Numbered figure captions
SubtitleTextAfter resetting "Average coverage" threshold to 4 (Figure 6), the left part of shrinkage plot displays the desirable monotone decreasing trend. Note that the left boundary on the x axis becomes log2(4) = 2
AnchorNameAverage coverage threshold

Image Added

 

Note that it is possible to achieve a similar effect by increasing a threshold of "Lowest maximal coverage", "Minimum coverage", or any similar filtering option (Figure 6). However, using "Average coverage" is the most straightforward: the shrinkage procedure uses log2(Average coverage) as an independent variable to fit the trend, so the x axis in the shrinkage plot is always log2(Average coverage) regardless of the filtering option chosen in Figure 6.

...

That line of reasoning suggests that neither DESeq2 nor limma are perfectly equipped for dealing with abnormal features. In fact, "limma trend" has no way to deal with them at all: shrinkage is applied regardless. If such abnormality is coupled with a low level of expression, it could be a good idea to get rid of the outlying features by raising the low expression threshold. For instance, while the trend in Figure 8A is monotone and decreasing in the left hand part of the plot, there are many low expression features with abnormally low error terms. Unless we have a special interest in those features, it makes sense to raise the low expression threshold so as to get rid of them. Image Removed
Figure 8A:

Numbered figure captions
SubtitleText
AnchorNameAverage expression theshold

Image Added

A) Average expression threshold can be raised to get rid of low expression features with abnormal error terms, circled in blue

Image Added

B) Six low expression features (circled in blue) account for a very sharp increase in the trend which can have an unduly large effect on overall results

There can also be a situation where a small number of low expression features have a very high influence on the trend which affects the p-values for all of the features (Figure 8B). It is reasonable to assume that the overall results should not be sensitive to the presence or absence of a few features, especially if they happen to have low expression. It makes sense to get rid of such influential points by increasing the threshold accordingly. Image Removed
Figure 8B: Six low expression features (circled in blue) account for a very sharp increase in the trend which can have an unduly large effect on overall results

Speaking of higher expression features, presently GSA has no automatic method to separate "abnormal" and "normal" features, so the user has to do some eyeballing of the shrinkage plot. However, for the purpose of investigating standalone outliers GSA can quantify the benefit of shrinkage in a well grounded way. In order to do that, one can enable both Lognormal and Lognormal with shrinkage in Advanced Options (Figure 9).


Figure 9: To quantify the benefit of shrinkage for any particular feature, enable these two models in "Custom" mode.

...