Page History

...

Double-click the Data processing title bar to re-collapse (Figure 18)

Classify cells using Scatter plot

An alternative method to clustering and UMAP/t-SNE for classifying cells is using a scatter plot to visualize the expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than gene expression data alone as the protein expression data has a better dynamic range and is less sparse.

Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click Scatter plot
Click Finish to run
Double-click the Scatter plot task node to open it
Click 2D to switch to a 2D plot style (Figure 44)

Numbered figure captions

SubtitleText	Viewing the 2D scatter plot
AnchorName	2D scatter plot

Image Added

Similar to the t-SNE or UMAP scatter plots, each point on the plot is a single cell. The axes are set to features (gene or protein) in the data set by default, but can be set to any attribute or feature. On this plot, we can see that CD3_TotalSeqB is on the x-axis and CD4_TotalSeqB is on the y-axis. We can use our selection and filtering tools to perform a basic classification of CD4 and CD8 T cells.

Click the Features tab in the Selection / Filtering section of the control panel
Type CD3 in the ID search bar of the Features tab
Click CD3_TotalSeqB in the drop-down (Figure 45)

Numbered figure captions

SubtitleText	Filtering by values for a feature
AnchorName	Filtering by a feature

Image Added

Click Image Added to add a filter for CD3 protein expression
Set the CD3_TotalSeqB filter to <= 2

This will select any cell with <= 2 normalized count for CD3 protein. Selected cells are shown in bold on the plot and, because we have CD3_TotalSeqB on one of our axes, the cutoff point chosen can be easily evaluated (Figure 46).

Numbered figure captions

SubtitleText	CD3+ cells are selected and shown in bold on the plot
AnchorName	CD3+ cells are selected

Image Added

The selected CD3+ cells are our T cells. We can filter to these cells prior to performing our classification of CD4 and CD8 T cells sub-types.

Click Image Added to filter to include only the selected cells

Next, we can switch the x-axis to show CD8 protein expression so that we can perform our classification.

Click the X axis text box in the Plot setup section of the control panel
Click CD8a_TotalSeqB from the drop-down list (or type it and then select it if it is not visible)
Click Image Added to rescale the axes to the included cells

The x-axis now shows CD8a protein expression (Figure 47).

Numbered figure captions

SubtitleText	Switching axes on the scatter plot
AnchorName	Switching axes

Image Added

We can now use a set of filters to select and classify the CD3+ CD4+ CD8- T cells.

Type CD4 in the ID search bar of the Features tab
Click CD4_TotalSeqB in the drop-down
Click Image Added to add a filter for CD4 protein expression
Set the CD4_TotalSeqB filter to <= 2
Type CD8a in the ID search bar of the Features tab
Click CD8a_TotalSeqB in the drop-down
Click Image Added to add a filter for CD8a protein expression
Set the CD8a_TotalSeqB filter to < 2

This will select the cells in the upper left-hand section of the plot (Figure 48).

Numbered figure captions

SubtitleText	Selecting CD3+ CD4+ CD8- cells
AnchorName	Selecting cells on the scatter plot

Image Added

Click Classify selection
Name the group CD4 T cells
Click Save

We can now select and classify CD3+ CD4- CD8+ T cells using the filters we have already created.

Change CD4_TotalSeqB filter to < 1.5
Change CD8a_TotalSeqB filter to >= 2

This selects the cells in the lower right-hand section of the plot (Figure 49).

Numbered figure captions

SubtitleText	Selecting CD3+ CD4- CD8+ cells
AnchorName	Selecting CD8 T cells

Image Added

Click Classify selection
Name the group CD8 T cells
Click Save

To view our classifications, we can clear the selection and color by classification.

Click Clear selection
Choose Classifications from the Color by drop-down menu (Figure 50).

Numbered figure captions

SubtitleText	Classified CD4 and CD8 T cells
AnchorName	Classified cells

Image Added

Alternatively, we could have used the lasso tool Image Added to select the population of interest manually and then classified the selected cells.

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this in a subsequent section of the tutorial.

Dimensional reduction and clustering with protein expression data

For experiments like CITE-Seq where we have many protein markers, we can use dimensional reduction and clustering to identify groups of similar cells based on their overall expression pattern.

In the Merged counts data node, we have two data types. We can choose to use one or both of the data types in our analysis. Here, we will run dimensional reduction and clustering on the protein expression data.

Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click Graph-based clustering
Click Antibody Capture for Include features where "Feature type" is
Click Finish to run

If there are fewer than 50 proteins in the data set, all possible PCs (number of proteins - 1) will be used by default and, because using all the PCs will capture all of the variance in the data set, this is equivalent to running clustering on the original data. If you data set has more than 50 proteins and you want to run clustering on full data instead of a subset of PCs, simply set the number of PCs to All in the advanced settings. We will discuss how to pick an optimal number of PCs for data with larger numbers of features, like gene expression data, in a subsequent section of the tutorial.

Once Graph-based clustering has finished running and produced a Clustering result data node, we can visualize the results using UMAP or t-SNE. Both are dimensional reduction techniques that place cells with similar expression close together. An advantage of UMAP over t-SNE is that is preserves more of the global structure of the data. This means that with UMAP, more similar clusters are closer together while dissimilar clusters are further apart. With t-SNE, the relative positions of clusters to each other are often uninformative.

Click the Clustering result data node
Click Exploratory analysis in the toolbox
Click UMAP
Click Antibody Capture for Include features where "Feature type" is
Click Finish to run

We can open the UMAP task report to view the clustering result.

Double-click the UMAP task node
Click 2D in the plot style section to switch to 2D

Each point on the plot is a cell and the cells are colored by their cluster assignments (Figure 39).

Numbered figure captions

SubtitleText	UMAP from protein expression data
AnchorName	UMAP on protein expression

Image Added

Because we merged the gene and protein expression data, we can overlay protein and gene expression values on the plot.

Choose Expression from the Color by drop-down menu
Type CD4 in the search box and choose CD4_TotalSeqB from the drop-down (Figure 23)

Numbered figure captions

SubtitleText	Coloring by expression
AnchorName	Coloring by a feature

Image Added

Cells that express high levels of CD4 are colored blue on the plot (Figure ).

Numbered figure captions

SubtitleText	Coloring by CD4 protein expression
AnchorName	Coloring by protein expression

Image Added

The cluster of cells expressing high levels of CD4 are likely our CD4 T cells. We can take a closer look at the CD4 T cell cluster to see if any sub-types can be identified using the clustering results and expression information.

Click Image Added to activate the lasso tool
Draw a lasso around the large blue group of cells at the bottom right of the plot to select them (Figure )

Numbered figure captions

SubtitleText	Selecting the CD4 cluster
AnchorName	Selecting CD4 cells

Image Added

Click Image Added to filter to include only the selected cells
Click Image Added to rescale the axes to the included cells

With that, let's take a look at the clustering results from the protein expression data for these cells.

Choose Graph-based from the Color by drop-down menu

Numbered figure captions

SubtitleText	Protein-based clustering results
AnchorName	Clustering results from protein data

Image Added

Again, the colors here indicate the cluster assignment for each cell. Because we ran clustering using only the protein expression data, the cluster assignments are based on each cells protein expression data. To help identify which cell types the clusters correspond to, we generate a group biomarkers table with every clustering result. Biomarkers are genes or proteins that are expressed highly in a clusters when compared with the other clusters. Please note that while the clustering was calculated using only the protein expression data, the biomarkers are drawn from both gene and protein expression data.

The far-left cluster, cluster 8, has several interesting biomarkers. The top biomarker, is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Two of the other biomarkers are PD-1 protein, which is expressed in Tfh cells, promotes self-tolerance, and is a target for immunotherapy drugs; and TIGIT protein, another immunotherapy drug target that promotes self-tolerance.

To assess the specificity of these biomarkers to this cluster, we can overlay the expression on the scatter plot.

Choose Expression from the Color by drop-down menu
Type PD-1 in the search box and choose PD-1_TotalSeqB from the drop-down

PD-1 expression is highest in cluster 8 with uniformly strong expression throughout (Figure ).

Numbered figure captions

SubtitleText	PD-1 expression in helper T cells
AnchorName	PD-1 expression

Image Added

Type PDCD1 in the Expression search box and choose PDCD1 from the drop-down

It is interesting to note that this pattern of PD-1 expression is not easily discernible at the PD-1 gene expression level (PDCD1) (Figure ).

Numbered figure captions

SubtitleText	PDCD1 (PD-1) gene expression does not form a clear pattern
AnchorName	PD-1 gene expression

Image Added

Type CXCL13 in the Expression search box and choose CXCL13 from the drop-down

The Tfh cell marker, CXCL13, is highly and specifically expressed in cluster 8 (Figure ), so we will classify the cells from cluster 8 as Tfh cells.

Numbered figure captions

SubtitleText	CXCL13 expression is strong in cluster 8
AnchorName	CXCL13 expression

Image Added

Choose Graph-based from the Color by drop-down menu
Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel (Figure )

Numbered figure captions

SubtitleText	Choosing to select by cluster
AnchorName	Selecting a cluster

Image Added

Click the check box for 8 to select cluster 8 (Figure )

Numbered figure captions

SubtitleText	Selecting a cluster
AnchorName	Selecting a cluster

Image Added

Click Classify selection
Name the cells Tfh cells
Click Save

We can classify the remaining cells from this CD4+ group as Helper T cells.

Click Image Added to invert the selection and select the cells outside of cluster 8
Click Classify selection
Name the cells Helper T cells
Click Save

To return to the full data set, we can clear our selection and filter.

Click Clear selection
Click Clear filters

To visualize our classifications, we can color by Classifications.

Choose Classifications from the Color by drop-down menu (Figure )

Numbered figure captions

SubtitleText	Coloring by classification
AnchorName	Coloring by classification

Image Added

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this in a subsequent section of the tutorial.

Clustering and dimensional reduction with gene expression data

Because principal components are used as the input for both graph-based clustering and UMAP when working with gene expression data, it is important to determine an optimal number of PCs to use in downstream analysis.

Choosing the number of PCs

In this data set, we have two data types. We can choose to run analysis tasks on one or both of the data types. Here, we will run PCA on only the mRNA data to find the optimal number of PCs for the mRNA data.

...

In this data set, a reasonable cut-off could be set anywhere between around 10 and 30 PCs. We will use 15 in downstream steps.

Cluster by Gene Expression data

CITE-Seq data includes both gene and protein expression information. When the data types are combined, we can perform downstream analysis using both data types. We will begin with the mRNA data.After determining the optimal number of PCs, we can proceed to clustering.

Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click Graph-based clustering
Click Gene Expression for Include features where "Feature type" is
Click Configure to access the advanced settings
Set Number of principal components to 15
Click Apply
Click Finish to run (Figure 17)

...

Once Graph-based clustering has finished running and produced a Clustering result data node, we can visualize the results using UMAP or t-SNE. Both are dimensional reduction techniques that group cells with similar expression into visible clusters.

Click the Clustering result data node
Click Exploratory analysis in the toolbox
Click UMAP
Click Gene Expression for Include features where "Feature type" is
Click Configure to access the advanced settings
Set Number of principal components to 15
Click Apply
Click Finish to run (Figure 18)

Numbered figure captions

SubtitleText	Running UMAP on the Gene Expression data
AnchorName	Running UMAP

The Analyses tab now includes a UMAP task node (Figure 18).

Numbered figure captions

SubtitleText	Appearance of the Analyses tab after Graph-based clustering and UMAP on Gene Expression data
AnchorName	Results of UMAP

Image Removed

Double-click the UMAP task node to open the task report

...

Numbered figure captions

SubtitleText	UMAP calculated on Gene Expression values. Colored by Graph-based clustering results.
AnchorName	UMAP results

An advantage of UMAP over t-SNE is that is preserves more of the global structure of the data. This means that with UMAP, more similar clusters are closer together while dissimilar clusters are further apart. With t-SNE, the relative positions of clusters to each other are often uninformative.

Click the 2D radio button for Plot style to switch to the 2D UMAP (Figure 20)

...

This will color the plot by NKG7 gene expression and CD4 protein expression, a marker for helper T cells. We can add a third feature.

Click to color by a second third feature (gene or protein)
Type CD3 and choose CD3_TotalSeqB from the drop-down

...

We have now selected only cells that express >= 1 normalized count for NKG7 gene and CD3 protein, but also have <= 2 normalized count for CD4 protein (Figure 27).

Numbered figure captions

SubtitleText	Filtering using multiple genes and proteins
AnchorName	Filtering using multiple genes and proteins

...

This will produce a Classified groups data node.

Clustering by protein expression

In addition to performing clustering by gene expression data, we can use the protein data for clustering and UMAP visualization.

Click the Classified groups data node
Click Exploratory analysis in the toolbox
Click Graph-based clustering
Click Antibody Capture for Include features where "Feature type" is
Click Finish to run

Notice that we did not set the number of PCs in this case. If there are fewer than 50 proteins in the data set, all possible PCs will be used by default and, because using all the PCs will capture all of the variance in the data set, this is equivalent to running clustering on the original data. If you data set has more than 50 proteins and you want to run clustering on full data instead of a subset of PCs, simply set the number of PCs to All in the advanced settings.

...

We can open the UMAP task report to view the clustering result.

Double-click the UMAP task node
Click Group biomarkers to minimize the biomarkers table
Click 2D in the plot style section to switch to 2D

UMAP using the protein expression data resolves the cell types we identified earlier on the gene expression UMAP (Figure 39).

Numbered figure captions

SubtitleText	UMAP from protein expression data
AnchorName	UMAP on protein expression

Image Removed

We can take a closer look at the helper T cell cluster to see if any additional cell types can be found using the protein expression data.

Click Image Removed to activate the lasso tool
Draw a lasso around the Helper T cell cluster to select them
Click Image Removed to filter to include only the selected cells
Click Image Removed to rescale the axes to the included cells

With that, let's take a look at the clustering results from the protein expression data for these cells.

Choose Graph-based from the Color by drop-down menu

Please note that Graph-based always refers to the most recent graph-based clustering result in the pipeline.

Click Group biomarkers to expand the biomarkers table
Select Graph-based from the Method drop-down menu (Figure 40)

Numbered figure captions

SubtitleText	Protein-based clustering results for Helper T cells
AnchorName	Clustering results for helper T cells

Image Removed

The far-left cluster, cluster 8, has several interesting biomarkers. The top biomarker, is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Two of the other biomarkers are PD-1 protein, which promotes self-tolerance and is a target for immunotherapy drugs, and TIGIT, another immunotherapy drug target.

Choose Expression from the Color by drop-down menu
Type PD-1 in the search box and choose PD-1_TotalSeqB from the drop-down

PD-1 expression is highest in cluster 8 with uniformly strong expression throughout (Figure 41).

Numbered figure captions

SubtitleText	PD-1 expression in helper T cells
AnchorName	PD-1 expression

Image Removed

Type PDCD1 in the Expression search box and choose PDCD1 from the drop-down

It is interesting to note that this pattern of PD-1 expression is not easily discernible at the PD-1 gene expression level (PDCD1) (Figure 42).

Numbered figure captions

SubtitleText	PDCD1 (PD-1) gene expression does not form a clear pattern
AnchorName	PD-1 gene expression

Image Removed

Type CXCL13 in the Expression search box and choose CXCL13 from the drop-down

The Tfh cell marker, CXCL13, is highly and specifically expressed in cluster 8, so we will classify these cells as Tfh (Figure 43).

Numbered figure captions

SubtitleText	CXCL13 expression is strong in cluster 8
AnchorName	CXCL13 expression

Image Removed

Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel
Click the check box for 8 to select cluster 8
Click Classify selection
Name the cells Tfh cells
Click Save
Choose Classifications from the Color by drop-down menu
Click Clear selection
Click Clear filters to return to the full data set
Click Apply classifications

Classify cells using Scatter plot

An alternative method to clustering and UMAP/t-SNE for classifying cells is using a scatter plot to visualize the expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than gene expression data alone as the protein expression data has a better dynamic range and is less sparse.

Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click Scatter plot
Click Finish to run
Double-click the Scatter plot task node to open it
Click 2D to switch to a 2D plot style (Figure 44)

Numbered figure captions

SubtitleText	Viewing the 2D scatter plot
AnchorName	2D scatter plot

Image Removed

Similar to the t-SNE or UMAP scatter plots, each point on the plot is a single cell. The axes are set to features (gene or protein) in the data set by default, but can be set to any attribute or feature. On this plot, we can see that CD3_TotalSeqB is on the x-axis and CD4_TotalSeqB is on the y-axis. We can use our selection and filtering tools to perform a basic classification of CD4 and CD8 T cells.

...

Numbered figure captions

SubtitleText	Filtering by values for a feature
AnchorName	Filtering by a feature

Image Removed

Click Image Removed to add a filter for CD3 protein expression
Set the CD3_TotalSeqB filter to <= 2

This will select any cell with <= 2 normalized count for CD3 protein. Selected cells are shown in bold on the plot and, because we have CD3_TotalSeqB on one of our axes, the cutoff point chosen can be easily evaluated (Figure 46).

Numbered figure captions

SubtitleText	CD3+ cells are selected and shown in bold on the plot
AnchorName	CD3+ cells are selected

Image Removed

The selected CD3+ cells are our T cells. We can filter to these cells prior to performing our classification of CD4 and CD8 T cells sub-types.

Click Image Removed to filter to include only the selected cells

Next, we can switch the x-axis to show CD8 protein expression so that we can perform our classification.

Click the X axis text box in the Plot setup section of the control panel
Click CD8a_TotalSeqB from the drop-down list (or type it and then select it if it is not visible)
Click Image Removed to rescale the axes to the included cells

The x-axis now shows CD8a protein expression (Figure 47).

Numbered figure captions

SubtitleText	Switching axes on the scatter plot
AnchorName	Switching axes

Image Removed

We can now use a set of filters to select and classify the CD3+ CD4+ CD8- T cells.

Type CD4 in the ID search bar of the Features tab
Click CD4_TotalSeqB in the drop-down
Click Image Removed to add a filter for CD4 protein expression
Set the CD4_TotalSeqB filter to <= 2
Type CD8a in the ID search bar of the Features tab
Click CD8a_TotalSeqB in the drop-down
Click Image Removed to add a filter for CD8a protein expression
Set the CD8a_TotalSeqB filter to < 2

This will select the cells in the upper left-hand section of the plot (Figure 48).

Numbered figure captions

SubtitleText	Selecting CD3+ CD4+ CD8- cells
AnchorName	Selecting cells on the scatter plot

Image Removed

Click Classify selection
Name the group CD4 T cells
Click Save

We can now select and classify CD3+ CD4- CD8+ T cells using the filters we have already created.

Change CD4_TotalSeqB filter to < 1.5
Change CD8a_TotalSeqB filter to >= 2

This selects the cells in the lower right-hand section of the plot (Figure 49).

Numbered figure captions

SubtitleText	Selecting CD3+ CD4- CD8+ cells
AnchorName	Selecting CD8 T cells

Image Removed

Click Classify selection
Name the group CD8 T cells
Click Save

To view our classifications, we can clear the selection and color by classification.

...

Numbered figure captions

SubtitleText	Classified CD4 and CD8 T cells
AnchorName	Classified cells

Image Removed

An alternative approach to using the expression threshold filters is to draw a lasso around the population of interest using the lasso tool Image Removed and then classify the selected cells.

Partek Flow Documentation

Page tree

Versions Compared

Old Version 19

New Version 20

Key

Classify cells using Scatter plot

Dimensional reduction and clustering with protein expression data

Clustering and dimensional reduction with gene expression data

Choosing the number of PCs

Cluster by Gene Expression data

Clustering by protein expression

Classify cells using Scatter plot