PCA
Next, we will perform some exploratory analysis on the merged mRNA and protein expression data and visualize the data in preparation to identify cell populations. Because the merged count matrix has thousands of features, it is a good idea to reduce the dimensionality of the data for more efficient downstream processing.
- Click the Merged counts data node
- Click Exploratory analysis in the toolbox
- Click PCA
- Click Finish to run the PCA with default settings (Figure 1)
- Double click the PCA data node to open the task report
The PCA plot will open in a new data viewer session. A 3D scatterplot will be displayed on the canvas (Figure 3).
- Click and drag the Scree plot from New plot under Setup on the left onto the canvas
- Drop it over the Replace option (Figure 4)
- Select PCA as data for the new Scree plot (Figure 5)
- Click and drag over the first set of PCs to zoom in (Figure 7)
- Mouse over the Scree plot to identify the point where additional PCs offer little additional information (Figure 8)
In this data set, a reasonable cut-off could be set anywhere between around 10 and 30 PCs. We will use 15 in downstream steps.
Graph-based clustering
We can use Graph-based clustering to group similar cells together in an unsupervised manner.
- Click the project name near the top to go back to the Analyses tab
- Click the circular PCA data node
- Click Exploratory analysis in the toolbox
- Click Graph-based clustering
- Click to Compute biomarkers
- Set the number of principal components to 15 (Figure 9)
- Click Configure under Advanced options and change the Resolution to 1.0
- Click Finish to run the task
UMAP
Once the graph-based clustering task has completed, we can visualize the results with a UMAP plot. You could use the same steps here to generate a t-SNE plot. For this tutorial, we will use UMAP, as it is faster on several thousand cells.
- Click the circular PCA data node
- Click Exploratory analysis in the toolbox
- Click UMAP
- Set the number of principal components to 15 (Figure 11)
- Click Finish to run the task
Notes on Performing Exploratory Analysis with Protein or Gene Expression Data Only
In this tutorial, we have performed exploratory analysis on merged protein and gene expression data, and we will perform classification on the merged data in the next step.
It can be interesting to perform exploratory analysis on the two feature types separately. For example, you might be interested to see how the clustering of the same cells differs between protein expression profiles vs. gene expression profiles.
To perform exploratory analysis on the two feature types separately, select the Merged counts data node, click Pre-analysis tools, followed by Split by feature type from the toolbox. A new task, Split by feature type, will be added to the pipeline resulting in two output data nodes: Antibody capture (protein data) and Gene expression (mRNA data). Both contain the same high-quality cells.
Performing exploratory analysis with gene expression data is the same as for the merged counts. Because there are a large number of genes, you will need to reduce the dimensionality with PCA, choose an optimal number of PCs and perform downstream clustering and visualization (e.g. graph-based clustering and UMAP/t-SNE). Performing exploratory analysis with protein data is different. There is no need to reduce the dimensionality as there are only a handful of features (17 proteins in this case), so you can proceed straight to downstream clustering and visualization. Figure 13 shows an example of how the pipeline might look if the data is split and analyzed separately.
Additional Assistance
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Your Rating: | Results: | 9 | rates |