Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Double-click the Data processing title bar to re-collapse (Figure 18)

Classify cells using Scatter plot

An alternative method to clustering and UMAP/t-SNE for classifying cells is using a scatter plot to visualize the expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than gene expression data alone as the protein expression data has a better dynamic range and is less sparse. 

  • Click the Merged counts data node
  • Click Exploratory analysis in the toolbox
  • Click Scatter plot
  • Click Finish to run 
  • Double-click the Scatter plot task node to open it
  • Click 2D to switch to a 2D plot style (Figure 44)

 

Numbered figure captions
SubtitleTextViewing the 2D scatter plot
AnchorName2D scatter plot

Image Added

Similar to the t-SNE or UMAP scatter plots, each point on the plot is a single cell. The axes are set to features (gene or protein) in the data set by default, but can be set to any attribute or feature. On this plot, we can see that CD3_TotalSeqB is on the x-axis and CD4_TotalSeqB is on the y-axis. We can use our selection and filtering tools to perform a basic classification of CD4 and CD8 T cells. 

  • Click the Features tab in the Selection / Filtering section of the control panel
  • Type CD3 in the ID search bar of the Features tab
  • Click CD3_TotalSeqB in the drop-down (Figure 45)

Numbered figure captions
SubtitleTextFiltering by values for a feature
AnchorNameFiltering by a feature

Image Added

  • Click Image Added to add a filter for CD3 protein expression
  • Set the CD3_TotalSeqB filter to <= 2 

This will select any cell with <= 2 normalized count for CD3 protein. Selected cells are shown in bold on the plot and, because we have CD3_TotalSeqB on one of our axes, the cutoff point chosen can be easily evaluated (Figure 46). 

 

Numbered figure captions
SubtitleTextCD3+ cells are selected and shown in bold on the plot
AnchorNameCD3+ cells are selected

Image Added

The selected CD3+ cells are our T cells. We can filter to these cells prior to performing our classification of CD4 and CD8 T cells sub-types.

  • Click Image Added to filter to include only the selected cells

Next, we can switch the x-axis to show CD8 protein expression so that we can perform our classification.

  • Click the axis text box in the Plot setup section of the control panel
  • Click CD8a_TotalSeqB from the drop-down list (or type it and then select it if it is not visible)
  • Click Image Added to rescale the axes to the included cells

The x-axis now shows CD8a protein expression (Figure 47).

 

Numbered figure captions
SubtitleTextSwitching axes on the scatter plot
AnchorNameSwitching axes

Image Added

We can now use a set of filters to select and classify the CD3+ CD4+ CD8- T cells.

  • Type CD4 in the ID search bar of the Features tab
  • Click CD4_TotalSeqB in the drop-down
  • Click Image Added to add a filter for CD4 protein expression
  • Set the CD4_TotalSeqB filter to <= 2 
  • Type CD8a in the ID search bar of the Features tab
  • Click CD8a_TotalSeqB in the drop-down
  • Click Image Added to add a filter for CD8a protein expression
  • Set the CD8a_TotalSeqB filter to < 2 

This will select the cells in the upper left-hand section of the plot (Figure 48). 

 

Numbered figure captions
SubtitleTextSelecting CD3+ CD4+ CD8- cells
AnchorNameSelecting cells on the scatter plot

Image Added

  • Click Classify selection 
  • Name the group CD4 T cells
  • Click Save

We can now select and classify CD3+ CD4- CD8+ T cells using the filters we have already created.

  • Change CD4_TotalSeqB filter to < 1.5
  • Change CD8a_TotalSeqB filter to >= 2

This selects the cells in the lower right-hand section of the plot (Figure 49). 

 

Numbered figure captions
SubtitleTextSelecting CD3+ CD4- CD8+ cells
AnchorNameSelecting CD8 T cells

Image Added

  • Click Classify selection 
  • Name the group CD8 T cells
  • Click Save

To view our classifications, we can clear the selection and color by classification.

  • Click Clear selection 
  • Choose Classifications from the Color by drop-down menu (Figure 50).

Numbered figure captions
SubtitleTextClassified CD4 and CD8 T cells
AnchorNameClassified cells

Image Added

Alternatively, we could have used the lasso tool Image Added to select the population of interest manually and then classified the selected cells.

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this in a subsequent section of the tutorial. 

Dimensional reduction and clustering with protein expression data

For experiments like CITE-Seq where we have many protein markers, we can use dimensional reduction and clustering to identify groups of similar cells based on their overall expression pattern. 

In the Merged counts data node, we have two data types. We can choose to use one or both of the data types in our analysis. Here, we will run dimensional reduction and clustering on the protein expression data.

  • Click the Merged counts data node
  • Click Exploratory analysis in the toolbox
  • Click Graph-based clustering 
  • Click Antibody Capture for Include features where "Feature type" is
  • Click Finish to run 

If there are fewer than 50 proteins in the data set, all possible PCs (number of proteins - 1) will be used by default and, because using all the PCs will capture all of the variance in the data set, this is equivalent to running clustering on the original data. If you data set has more than 50 proteins and you want to run clustering on full data instead of a subset of PCs, simply set the number of PCs to All in the advanced settings. We will discuss how to pick an optimal number of PCs for data with larger numbers of features, like gene expression data, in a subsequent section of the tutorial.

Once Graph-based clustering has finished running and produced a Clustering result data node, we can visualize the results using UMAP or t-SNE. Both are dimensional reduction techniques that place cells with similar expression close together. An advantage of UMAP over t-SNE is that is preserves more of the global structure of the data. This means that with UMAP, more similar clusters are closer together while dissimilar clusters are further apart. With t-SNE, the relative positions of clusters to each other are often uninformative.  

  • Click the Clustering result data node 
  • Click Exploratory analysis in the toolbox
  • Click UMAP
  • Click Antibody Capture for Include features where "Feature type" is
  • Click Finish to run

We can open the UMAP task report to view the clustering result.

  • Double-click the UMAP task node
  • Click 2D in the plot style section to switch to 2D

Each point on the plot is a cell and the cells are colored by their cluster assignments (Figure 39).

 

Numbered figure captions
SubtitleTextUMAP from protein expression data
AnchorNameUMAP on protein expression

Image Added

Because we merged the gene and protein expression data, we can overlay protein and gene expression values on the plot.

  • Choose Expression from the Color by drop-down menu
  • Type CD4 in the search box and choose CD4_TotalSeqB from the drop-down (Figure 23)

Numbered figure captions
SubtitleTextColoring by expression
AnchorNameColoring by a feature

Image Added

Cells that express high levels of CD4 are colored blue on the plot (Figure ).

 

Numbered figure captions
SubtitleTextColoring by CD4 protein expression
AnchorNameColoring by protein expression

Image Added

The cluster of cells expressing high levels of CD4 are likely our CD4 T cells. We can take a closer look at the CD4 T cell cluster to see if any sub-types can be identified using the clustering results and expression information.

  • Click Image Added to activate the lasso tool
  • Draw a lasso around the large blue group of cells at the bottom right of the plot to select them (Figure )

Numbered figure captions
SubtitleTextSelecting the CD4 cluster
AnchorNameSelecting CD4 cells

Image Added

  • Click Image Added to filter to include only the selected cells
  • Click Image Added to rescale the axes to the included cells 

With that, let's take a look at the clustering results from the protein expression data for these cells. 

  • Choose Graph-based from the Color by drop-down menu

 

Numbered figure captions
SubtitleTextProtein-based clustering results
AnchorNameClustering results from protein data

Image Added

Again, the colors here indicate the cluster assignment for each cell. Because we ran clustering using only the protein expression data, the cluster assignments are based on each cells protein expression data. To help identify which cell types the clusters correspond to, we generate a group biomarkers table with every clustering result. Biomarkers are genes or proteins that are expressed highly in a clusters when compared with the other clusters. Please note that while the clustering was calculated using only the protein expression data, the biomarkers are drawn from both gene and protein expression data. 

The far-left cluster, cluster 8, has several interesting biomarkers. The top biomarker, is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Two of the other biomarkers are PD-1 protein, which is expressed in Tfh cells, promotes self-tolerance, and is a target for immunotherapy drugs; and TIGIT protein, another immunotherapy drug target that promotes self-tolerance. 

To assess the specificity of these biomarkers to this cluster, we can overlay the expression on the scatter plot. 

  • Choose Expression from the Color by drop-down menu
  • Type PD-1 in the search box and choose PD-1_TotalSeqB from the drop-down

PD-1 expression is highest in cluster 8 with uniformly strong expression throughout (Figure ).

 

Numbered figure captions
SubtitleTextPD-1 expression in helper T cells
AnchorNamePD-1 expression

Image Added

  • Type PDCD1 in the Expression search box and choose PDCD1 from the drop-down

It is interesting to note that this pattern of PD-1 expression is not easily discernible at the PD-1 gene expression level (PDCD1) (Figure ).

 

Numbered figure captions
SubtitleTextPDCD1 (PD-1) gene expression does not form a clear pattern
AnchorNamePD-1 gene expression

Image Added

  • Type CXCL13 in the Expression search box and choose CXCL13 from the drop-down

The Tfh cell marker, CXCL13, is highly and specifically expressed in cluster 8 (Figure ), so we will classify the cells from cluster 8 as Tfh cells. 

 

Numbered figure captions
SubtitleTextCXCL13 expression is strong in cluster 8
AnchorNameCXCL13 expression

Image Added

  • Choose Graph-based from the Color by drop-down menu
  • Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel (Figure )

Numbered figure captions
SubtitleTextChoosing to select by cluster
AnchorNameSelecting a cluster

Image Added

  • Click the check box for to select cluster 8 (Figure )

Numbered figure captions
SubtitleTextSelecting a cluster
AnchorNameSelecting a cluster

Image Added

  • Click Classify selection 
  • Name the cells Tfh cells
  • Click Save 

We can classify the remaining cells from this CD4+ group as Helper T cells. 

  • Click Image Added to invert the selection and select the cells outside of cluster 8
  • Click Classify selection 
  • Name the cells Helper T cells
  • Click Save 

To return to the full data set, we can clear our selection and filter.

  • Click Clear selection
  • Click Clear filters

To visualize our classifications, we can color by Classifications.

  •  Choose Classifications from the Color by drop-down menu (Figure )

 

Numbered figure captions
SubtitleTextColoring by classification
AnchorNameColoring by classification

Image Added

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this in a subsequent section of the tutorial. 

Clustering and dimensional reduction with gene expression data

Because principal components are used as the input for both graph-based clustering and UMAP when working with gene expression data, it is important to determine an optimal number of PCs to use in downstream analysis. 

Choosing the number of PCs

In this data set, we have two data types. We can choose to run analysis tasks on one or both of the data types. Here, we will run PCA on only the mRNA data to find the optimal number of PCs for the mRNA data. 

...

In this data set, a reasonable cut-off could be set anywhere between around 10 and 30 PCs. We will use 15 in downstream steps. 

Cluster by Gene Expression data

CITE-Seq data includes both gene and protein expression information. When the data types are combined, we can perform downstream analysis using both data types. We will begin with the mRNA data.After determining the optimal number of PCs, we can proceed to clustering. 

  • Click the Merged counts data node
  • Click Exploratory analysis in the toolbox
  • Click Graph-based clustering 
  • Click Gene Expression for Include features where "Feature type" is
  • Click Configure to access the advanced settings
  • Set Number of principal components to 15
  • Click Apply 
  • Click Finish to run (Figure 17)

...

Once Graph-based clustering has finished running and produced a Clustering result data node, we can visualize the results using UMAP or t-SNE. Both are dimensional reduction techniques that group cells with similar expression into visible clusters. 

  • Click the Clustering result data node 
  • Click Exploratory analysis in the toolbox
  • Click UMAP
  • Click Gene Expression for Include features where "Feature type" is
  • Click Configure to access the advanced settings
  • Set Number of principal components to 15
  • Click Apply 
  • Click Finish to run (Figure 18)

Numbered figure captions
SubtitleTextRunning UMAP on the Gene Expression data
AnchorNameRunning UMAP

The Analyses tab now includes a UMAP task node (Figure 18).

 

Numbered figure captions
SubtitleTextAppearance of the Analyses tab after Graph-based clustering and UMAP on Gene Expression data
AnchorNameResults of UMAP

Image Removed

  • Double-click the UMAP task node to open the task report

...

Numbered figure captions
SubtitleTextUMAP calculated on Gene Expression values. Colored by Graph-based clustering results.
AnchorNameUMAP results

An advantage of UMAP over t-SNE is that is preserves more of the global structure of the data. This means that with UMAP, more similar clusters are closer together while dissimilar clusters are further apart. With t-SNE, the relative positions of clusters to each other are often uninformative.  

  • Click the 2D radio button for Plot style to switch to the 2D UMAP (Figure 20)

...

This will color the plot by NKG7 gene expression and CD4 protein expression, a marker for helper T cells. We can add a third feature.

  • Click  to color by a second third feature (gene or protein)
  • Type CD3 and choose CD3_TotalSeqB from the drop-down

...

We have now selected only cells that express >= 1 normalized count for NKG7 gene and CD3 protein, but also have <= 2 normalized count for CD4 protein (Figure 27).

 

Numbered figure captions
SubtitleTextFiltering using multiple genes and proteins
AnchorNameFiltering using multiple genes and proteins

...

This will produce a Classified groups data node. 

Clustering by protein expression

In addition to performing clustering by gene expression data, we can use the protein data for clustering and UMAP visualization. 

  • Click the Classified groups data node
  • Click Exploratory analysis in the toolbox
  • Click Graph-based clustering 
  • Click Antibody Capture for Include features where "Feature type" is
  • Click Finish to run 

Notice that we did not set the number of PCs in this case. If there are fewer than 50 proteins in the data set, all possible PCs will be used by default and, because using all the PCs will capture all of the variance in the data set, this is equivalent to running clustering on the original data. If you data set has more than 50 proteins and you want to run clustering on full data instead of a subset of PCs, simply set the number of PCs to All in the advanced settings.

...

 

...

We can open the UMAP task report to view the clustering result.

  • Double-click the UMAP task node
  • Click Group biomarkers to minimize the biomarkers table
  • Click 2D in the plot style section to switch to 2D

UMAP using the protein expression data resolves the cell types we identified earlier on the gene expression UMAP (Figure 39).

 

Numbered figure captions
SubtitleTextUMAP from protein expression data
AnchorNameUMAP on protein expression

Image Removed

We can take a closer look at the helper T cell cluster to see if any additional cell types can be found using the protein expression data.

  • Click Image Removed to activate the lasso tool
  • Draw a lasso around the Helper T cell cluster to select them
  • Click Image Removed to filter to include only the selected cells
  • Click Image Removed to rescale the axes to the included cells 

With that, let's take a look at the clustering results from the protein expression data for these cells.

  • Choose Graph-based from the Color by drop-down menu

Please note that Graph-based always refers to the most recent graph-based clustering result in the pipeline. 

  • Click Group biomarkers to expand the biomarkers table
  • Select Graph-based from the Method drop-down menu (Figure 40)

 

Numbered figure captions
SubtitleTextProtein-based clustering results for Helper T cells
AnchorNameClustering results for helper T cells

Image Removed

The far-left cluster, cluster 8, has several interesting biomarkers. The top biomarker, is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Two of the other biomarkers are PD-1 protein, which promotes self-tolerance and is a target for immunotherapy drugs, and TIGIT, another immunotherapy drug target. 

  • Choose Expression from the Color by drop-down menu
  • Type PD-1 in the search box and choose PD-1_TotalSeqB from the drop-down

PD-1 expression is highest in cluster 8 with uniformly strong expression throughout (Figure 41).

 

Numbered figure captions
SubtitleTextPD-1 expression in helper T cells
AnchorNamePD-1 expression

Image Removed

  • Type PDCD1 in the Expression search box and choose PDCD1 from the drop-down

It is interesting to note that this pattern of PD-1 expression is not easily discernible at the PD-1 gene expression level (PDCD1) (Figure 42). 

 

Numbered figure captions
SubtitleTextPDCD1 (PD-1) gene expression does not form a clear pattern
AnchorNamePD-1 gene expression

Image Removed

  • Type CXCL13 in the Expression search box and choose CXCL13 from the drop-down

The Tfh cell marker, CXCL13, is highly and specifically expressed in cluster 8, so we will classify these cells as Tfh (Figure 43). 

 

Numbered figure captions
SubtitleTextCXCL13 expression is strong in cluster 8
AnchorNameCXCL13 expression

Image Removed

  • Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel 
  • Click the check box for to select cluster 8
  • Click Classify selection 
  • Name the cells Tfh cells
  • Click Save 
  • Choose Classifications from the Color by drop-down menu
  • Click Clear selection 
  • Click Clear filters to return to the full data set
  • Click Apply classifications 

Classify cells using Scatter plot

An alternative method to clustering and UMAP/t-SNE for classifying cells is using a scatter plot to visualize the expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than gene expression data alone as the protein expression data has a better dynamic range and is less sparse. 

  • Click the Merged counts data node
  • Click Exploratory analysis in the toolbox
  • Click Scatter plot
  • Click Finish to run 
  • Double-click the Scatter plot task node to open it
  • Click 2D to switch to a 2D plot style (Figure 44)

 

Numbered figure captions
SubtitleTextViewing the 2D scatter plot
AnchorName2D scatter plot

Image Removed

Similar to the t-SNE or UMAP scatter plots, each point on the plot is a single cell. The axes are set to features (gene or protein) in the data set by default, but can be set to any attribute or feature. On this plot, we can see that CD3_TotalSeqB is on the x-axis and CD4_TotalSeqB is on the y-axis. We can use our selection and filtering tools to perform a basic classification of CD4 and CD8 T cells. 

...

Numbered figure captions
SubtitleTextFiltering by values for a feature
AnchorNameFiltering by a feature

Image Removed

  • Click Image Removed to add a filter for CD3 protein expression
  • Set the CD3_TotalSeqB filter to <= 2 

This will select any cell with <= 2 normalized count for CD3 protein. Selected cells are shown in bold on the plot and, because we have CD3_TotalSeqB on one of our axes, the cutoff point chosen can be easily evaluated (Figure 46). 

 

Numbered figure captions
SubtitleTextCD3+ cells are selected and shown in bold on the plot
AnchorNameCD3+ cells are selected

Image Removed

The selected CD3+ cells are our T cells. We can filter to these cells prior to performing our classification of CD4 and CD8 T cells sub-types.

  • Click Image Removed to filter to include only the selected cells

Next, we can switch the x-axis to show CD8 protein expression so that we can perform our classification.

  • Click the X axis text box in the Plot setup section of the control panel
  • Click CD8a_TotalSeqB from the drop-down list (or type it and then select it if it is not visible)
  • Click Image Removed to rescale the axes to the included cells

The x-axis now shows CD8a protein expression (Figure 47).

 

Numbered figure captions
SubtitleTextSwitching axes on the scatter plot
AnchorNameSwitching axes

Image Removed

We can now use a set of filters to select and classify the CD3+ CD4+ CD8- T cells.

  • Type CD4 in the ID search bar of the Features tab
  • Click CD4_TotalSeqB in the drop-down
  • Click Image Removed to add a filter for CD4 protein expression
  • Set the CD4_TotalSeqB filter to <= 2 
  • Type CD8a in the ID search bar of the Features tab
  • Click CD8a_TotalSeqB in the drop-down
  • Click Image Removed to add a filter for CD8a protein expression
  • Set the CD8a_TotalSeqB filter to < 2 

This will select the cells in the upper left-hand section of the plot (Figure 48). 

 

Numbered figure captions
SubtitleTextSelecting CD3+ CD4+ CD8- cells
AnchorNameSelecting cells on the scatter plot

Image Removed

  • Click Classify selection 
  • Name the group CD4 T cells
  • Click Save

We can now select and classify CD3+ CD4- CD8+ T cells using the filters we have already created.

  • Change CD4_TotalSeqB filter to < 1.5
  • Change CD8a_TotalSeqB filter to >= 2

This selects the cells in the lower right-hand section of the plot (Figure 49). 

 

Numbered figure captions
SubtitleTextSelecting CD3+ CD4- CD8+ cells
AnchorNameSelecting CD8 T cells

Image Removed

  • Click Classify selection 
  • Name the group CD8 T cells
  • Click Save

To view our classifications, we can clear the selection and color by classification.

...

Numbered figure captions
SubtitleTextClassified CD4 and CD8 T cells
AnchorNameClassified cells

Image Removed

An alternative approach to using the expression threshold filters is to draw a lasso around the population of interest using the lasso tool Image Removed and then classify the selected cells.