Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Graph-based clustering produces a Clustering result data node. Opening the task report opens the task report which lists the cluster results, cluster statistics, and top marker features per cluster (Figure 1). If clustering was run per-sample in with Split cells by sample enabled on a single cell projectcounts data node, the table displays the number of clusters found for each sample and clicking the sample name opens the sample-level report. 

...

The Maximum modularity is a measure of the quality of the clustering result. Values range from 0 to 1 with larger values indicating higher qualityModularity measures how much cells/samples within a cluster are similar to each other and less similar to cells/samples in other clusters. Higher modularity indicates a better  result. Optimal modularity is 1

Cluster statistics

The total number of clusters is listed along with the number and percentage of observations (cells/samples) in each cluster. 

...

Biomarkers for each cluster are calculated using an ANVOA test where each cluster is compared to the other cells/samples in the data set, genes with fold-change > 1.5 are included, and these genes are sorted by ascending p-value (ties broken by greater fold change). The top 10 genes for each cluster are shown in the table. The full ANOVA results can be obtained by clicking the Run ANOVA button, which will generate a Feature list data node. 

Basic Graph-based clustering parameters

Clustering algorithm

Choose which version of the Louvain clustering algorithm to use. Options are Louvain [1], Louvain with refinement [2], and SLM [3]. The most recent version is Smart Local Moving (SLM). The default is Louvain

Split cells by sample

Chose whether to run Graph-based clustering on all samples together or on each sample individually.

Checking the box will run Graph-based clustering on each sample individually.

Include features where "Feature type" is 

This option appears when there are multiple feature types in the input data node (e.g., CITE-Seq data). 

Select Any to run on all features or pick a feature type.

Advanced Graph-based clustering parameters

Resolution

To increase the number of clusters, increase the resolution (Figure 2). To decrease the number of clusters, decrease the resolution. Default is 1. 

...

If NN-Descent is chosen for Nearest Neighbor Type, the metric to use when determining distance between data points in high dimensional space can be set. Options are Euclidean, Manhattan, Chebyshev, Canberra, Bray Curtis, and Cosine. Default is Euclidean. 

PCA: Number of principal components 

Graph-based clustering uses principal components as its input. The number of principal components to use is set here. 

We recommend using the PCA task to determine the optimal number of principal components for your data. Default is 100.

PCA: Features contribute 

Options are equally or by variance. Feature values can be standardized prior to PCA so that the contribution of each feature does not depend on its variance. To standardize, choose equally. To take variance into account and focus on the most variable features, choose by variance. Default is by variance.

Normalization: Log transform data

You can choose to log transform the data prior to running PCA as part of Graph-based clustering. Default is disabled.

Normalization: Log base

If you are normalizing the data, choose a log base. Default is 2 when Log transform data is enabled.

Normalization: Log offset

If you are normalizing the data, choose an offset. Default is 1 when Log transform data is enabled.

References

[1] Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.

...