Page History
...
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
GSA dialog
The first step of GSA is to choose which attributes to include in the test (Figure 1). All sample attributes including numeric and categorical attributes are displayed in the dialog, so use the check button to select between them. An experiment with two attributes Cell type (with groups A and B)and Time (time points 0, 5, 10) is used as an example in this section.
...
If advanced normalization needs to be applied, perform the Normalize counts task on a quantification data node before doing differential expression detection (GSA or ANOVA).
GSA advanced options
Click on Configure to customize Advanced options (Figure 7).
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Low-expression feature
Low -expression feature section allows you to specify criteria to exclude features that do not meet requirements for the calculation.
- Lowest average coverage: the computation will exclude a feature if its geometric mean across all samples is below than the specified value
- Lowest maximum coverage: the computation will exclude a feature if its maximum across all samples is below the specified value
- Minimum coverage: the computation will exclude a feature if its sum across all samples is below than the specified value
- None: include all features in the computation
Multiple test correction
Multiple test correction can be performed on the p-values of each comparison, with FDR step-up being the default (1). If you check the Storey q-value (2), an extra column with q-values will be added to the report.
Report option
This section configures how to select the best model for a feature. There are two options for Model selection criterion: AICc (Akaike Information Criterion corrected) and AIC (Akaike Information Criterion). AICc is recommended for small sample size, while AIC is recommended for medium and large sample size What about large samples?(3). Note that when sample size grows from small to medium, AICc converges to AIC. Taking the AICc/AIC value into account, GSA considers the model with the lowest information criterion as the best choice.
...
There are situations when a model estimation procedure does not outright fail, but still encounters some difficulties. In this case, it can even generate p-value and fold change for the comparisons, but those values are not reliable, and can be misleading. It is recommended to use only reliable estimation results, so the default option for Use only reliable estimation results is set Yes.
Model types configuration
Partek® Flow® provides five response distribution types for each design model in the pool, namely:
...
Partek Flow keeps tracking the log status of the data, and no matter whether GSA is performed on logged data or not, the fold change calculation is always in linear scale
GSA report
If there are multiple design models and multiple distribution types included in the calculation, the fraction of genes using each model and type will be displayed as pie charts in the task result (Figure 9).
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
References
- Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, JRSS, B, 57, 289-300.
- Storey JD. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31: 2013-2035.
- Auer, 2011, A two-stage Poisson model for testing RNA-Seq
- Burnham, Anderson, 2010, Model selection and multimodel inference
- Law C, Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 2014 15:R29.
- http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html#cuffdiff-output-files
- Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biology, 2010
...