View Source

Next, we will perform some exploratory analysis on the merged mRNA and protein expression data and visualize the data in preparation to identify cell populations.

PCA

Because the merged count matrix has thousands of features, it is a good idea to reduce the dimensionality of the data for more efficient downstream processing.

Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click PCA
Click Finish to run the PCA with default settings (Figure ?)

Flow Documentation > Dimensionality Reduction and Clustering > PCA_default_settings.png

A PCA task node will be added to the pipeline under the Analyses tab and a circular PCA output data node will be produced (Figure ?)

Flow Documentation > Dimensionality Reduction and Clustering > PCA_task_output_node.png

Once the task completes, we will inspect the results to decide the optimal number of principal components (PCs) to use in downstream analyses. To do this, we will use a Scree plot.

Double click the PCA data node to open the task report

The PCA plot will open in a new data viewer session. A 3D scatterplot will be displayed on the canvas (Figure ?)

Flow Documentation > Dimensionality Reduction and Clustering > Merged_PCA.png

Click and drag the Scree plot from the Available plots card on the left onto the canvas
Drop it over the Replace option (Figure ?)

Flow Documentation > Dimensionality Reduction and Clustering > Scree_plot_click_and_drag.png

Select PCA as data for the new Scree plot

Flow Documentation > Dimensionality Reduction and Clustering > Select_PCA_data_node_for_Scree_plot.png

The Scree plot (Figure ?) shows the eigenvalues on the y-axis for each of the 100 PCs on the x-axis. The higher the eigenvalue, the more variance explained by each PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional components is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering and UMAP.

Flow Documentation > Dimensionality Reduction and Clustering > Scree_plot_CITE-Seq.png

Click and drag over the first set of PCs to zoom in (Figure ?)

Flow Documentation > Dimensionality Reduction and Clustering > Scree_plot_CITE-Seq_zoom_in.png

Mouse over the Scree plot to identify the point where additional PCs offer little additional information (Figure ?)

In this data set, a reasonable cut-off could be set anywhere between around 10 and 30 PCs. We will use 15 in downstream steps.

Flow Documentation > Dimensionality Reduction and Clustering > Scree_plot_PC15.png

Graph-based clustering

We can use Graph-based clustering to group similar cells together in an unsupervised manner.

Click the project name near the top to go back to the Analyses tab
Click the circular PCA data node
Click Exploratory analysis in the toolbox
Click Graph-based clustering
Set the number of principal components to 15 (Figure ?)
Click Finish to run the task

Flow Documentation > Dimensionality Reduction and Clustering > Graph-based clustering.png

A Graph-based clustering task node will be added to the pipeline under the Analyses tab and a circular Graph-based clusters output data node will be produced (Figure ?)

Flow Documentation > Dimensionality Reduction and Clustering > Graph-based-clustering_output.png

UMAP

Once the graph-based clustering task hs completed, we can visualize the results with a UMAP plot.

Click the circular Graph-based clusters data node
Click Exploratory analysis in the toolbox
Click UMAP
Set the number of principal components to 15 (Figure ?)
Click Finish to run the task

Flow Documentation > Dimensionality Reduction and Clustering > UMAP_task_set_up.png

A UMAP task node will be added to the pipeline under the Analyses tab and a circular UMAP output data node will be produced (Figure ?)

Flow Documentation > Dimensionality Reduction and Clustering > UMAP_task_output.png

PCA

Graph-based clustering

UMAP

Notes on Performing Exploratory Analysis with Protein or Gene Expression Data Only