What is Correlation analysis?

Correlation analysis is a statistical test that lets you rank features by their correlation with numeric attributes using Pearson (linear), Spearman (rank), or Kendall (tau) correlation. 

 Running Correlation analysis

We recommend normalizing you data prior to running Hurdle model, but it can be invoked on any counts data node. 

Only numeric factors are available. 

Correlation analysis produces a Feature list data node. The task report (Figure 1) is similar to the ANOVA and GSA task reports and includes a table with features on rows and statistical results on columns.

Each numeric attribute includes p-value, adjusted p-value columns (FDR step up and/or Storey q-value if included), and a correlation value. Each interaction will have p-value and adjusted p-value columns (FDR step up and/or Storey q-value if included).

Each feature includes  chromosome view dot plot correlation plot, and extra details  buttons. 

Correlation analysis advanced options

Low value filter

Low-value filter allows you to specify criteria to exclude features that do not meet requirements for the calculation. If there is filter feature task performed in the upstream analysis, the default of this filter is set to None, otherwise, the default is Lowest average coverage is set to 1.

Lowest average coverage: the computation will exclude a feature if its geometric mean across all samples is below than the specified value

Lowest maximum coverage: the computation will exclude a feature if its maximum across all samples is below the specified value

Minimum coverage: the computation will exclude a feature if its sum across all samples is below than the specified value

None: include all features in the computation

Multiple test correction

Multiple test correction can be performed on the p-values of each comparison, with FDR step-up being the default. If you check the Storey q-value, an extra column with q-values will be added to the report.

Use only reliable estimation results

There are situations when a model estimation procedure does not fail outright, but still encounters some difficulties. In this case, it can even generate p-value and fold change on the comparisons, but they are not reliable, i.e. they can be misleading. Therefore, the default of Use only reliable estimation results is set Yes.

Correlation type

Sets the type of correlation used to calculated the correlation coefficient and p-value. Options are Pearson (linear)Spearman (rank)Kendall (tau). Default is Pearson (linear)