...
Exploratory Analysis Results
- Double click the merged UMAP data node
- Under Configure on the left, click Style, select the Graph-based cluster node, and color by the Graph-based attribute (Figure 1)
...
Numbered figure captions |
---|
SubtitleText | Color the cells in the UMAP plot by their graph-based cluster assignment |
---|
AnchorName | UMAP of CITE-Seq data |
---|
|
Image RemovedImage Added |
The 3D UMAP plot opens in a new data viewer session (Figure 2). Each point is a different cell and they are clustered based on how similar their expression profiles are across proteins and genes. Because a graph-based clustering task was performed upstream, a biomarker table is also displayed under the plot. This table lists the proteins and genes that are most highly expressed in each graph-based cluster. The graph-based clustering found 11 clusters, so there are 11 columns in the biomarker table.
...
Numbered figure captions |
---|
SubtitleText | Add a 2D scatter plot and place it to the right of the UMAP plot |
---|
AnchorName | Add 2D scatter plot |
---|
|
Image RemovedImage Added |
- Click Merged counts to use as data for the 2D scatter plot (Figure 3)
...
Numbered figure captions |
---|
SubtitleText | Choose Merged counts data to draw the 2D scatter plot |
---|
AnchorName | Merged counts data for 2D scatter plot |
---|
|
Image RemovedImage Added |
A 2D scatter plot has been added to the right of the UMAP plot. The points in the 2D scatter plot are the same cells as in the UMAP, but they are positioned along the x- and y-axes according to their expression level for two protein markers: CD3_TotalSeqB and CD4_TotalSeqB, respectively (Figure 4).
...
Numbered figure captions |
---|
SubtitleText | The canvas now has a 2D scatter plot next to the UMAP |
---|
AnchorName | UMAP and 2D scatter plot |
---|
|
Image RemovedImage Added |
- In Select & Filter, click Criteria to change the selection mode
- Click the blue circle next to the Add rule drop-down menu (Figure 5)
...
Numbered figure captions |
---|
SubtitleText | Click the blue circle to change the data source for the rule selector |
---|
AnchorName | Selection card rule mode |
---|
|
Image RemovedImage Added |
- Click Merged counts to change the data source
- Choose CD3_TotalSeqB from the drop-down list (Figure 6)
...
Numbered figure captions |
---|
SubtitleText | Choose the CD3_TotalSeqB protein marker as a selection rule |
---|
AnchorName | Choose CD3 Protein marker |
---|
|
Image RemovedImage Added |
- Click and drag the slider on the CD3D_TotalSeqB selection rule to include the CD3 positive cells (Figure 7)
...
Numbered figure captions |
---|
SubtitleText | Use the slider to select cells with positive expression for the CD3 protein marker |
---|
AnchorName | Select CD3+ cells |
---|
|
Image RemovedImage Added |
As you move the slider up and down, the corresponding points on both plots will dynamically update. The cells with a high expression for the CD3 protein marker (a marker for T cells) are highlighted and the deselected points are dimmed (Figure 8).
...
Numbered figure captions |
---|
SubtitleText | CD3+ cells are selected on both plots |
---|
AnchorName | CD3+ cells selected |
---|
|
Image RemovedImage Added |
- Click Merged counts in Get data on the left under Setup
- Click and drag CD8a_TotalSeqB onto the 2D scatter plot (Figure 9)
- Drop CD8_TotalSeqB onto the x-axis configuration option
...
Numbered figure captions |
---|
SubtitleText | Change the feature plotted on the x-axis to CD8_TotalSeqB |
---|
AnchorName | CD8 protein on x-axis |
---|
|
Image RemovedImage Added |
The CD3 positive cells are still selected, but now you can see how they separate into CD4 and CD8 positive populations (Figure 10).
...
Numbered figure captions |
---|
SubtitleText | 2D scatter plot with CD4_TotalSeqB and CD8_TotalSeqB features on the axes |
---|
AnchorName | CD8 and CD4 2D scatter plot |
---|
|
Image RemovedImage Added |
The simplest way to classifying cell types is to look for the expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than with gene expression data alone as the protein expression data has a better dynamic range and is less sparse. Additionally, many cell types have expected cell surface marker profiles established using other technologies such as flow cytometry or CyTOF. Let's compare the resolution power of the CD4 and CD8A gene expression markers compared to their protein counterparts.
...
Numbered figure captions |
---|
SubtitleText | The second 2D scatter plot (bottom) has the CD8 and CD4 genes plotted against each other |
---|
AnchorName | 2nd 2D scatter plot |
---|
|
Image RemovedImage Added |
- On the first 2D scatter plot (with protein markers), click in the top right corner
- Manually select the cells with high expression of the CD4_TotalSeqB protein marker (Figure 13)
...
Numbered figure captions |
---|
SubtitleText | Draw a lasso to manually select CD4+ cells, based on protein expression |
---|
AnchorName | Select CD4+ cells (protein) |
---|
|
Image RemovedImage Added |
Let's perform the same test on the gene expression data.
...
Numbered figure captions |
---|
SubtitleText | Draw a lasso to manually select CD4+ (mRNA) cells |
---|
AnchorName | Select CD4+ cells |
---|
|
Image RemovedImage Added |
This time, only 500 cells show positive expression for the CD4 marker gene. This means that the protein data is less sparse (i.e. there fewer zero counts), which further helps to reliably detect sub-populations.
...
Based on the exploratory analysis above, most of the CD3 positive cells are in the group of cells in the bottom right corner side of the UMAP plot. This is likely to be a group of T cells. We will now examine this group in more detail to identify T cell sub-populations.
...
Numbered figure captions |
---|
SubtitleText | Select the group of putative T cells |
---|
AnchorName | Lasso T cells |
---|
|
Image RemovedImage Added |
- Click in the Select & Filter tool to include the selected points
- Click in the top right of the plot to switch back to pointer mode
- Click and drag the plot to rotate it around
Deselected cells are excluded and the axes have been rescaled to give better resolution of the selected points (Figure 16). Note that the UMAP has not been recalculated, the axes have just been rescaled.
Numbered figure captions |
---|
SubtitleText | Group of putative T-cells |
---|
AnchorName | T cell group |
---|
|
Image RemovedImage Added |
This group of putative T cells predominantly consists of cells assigned to graph-based clusters 3, 4, and 6, and 7, indicated by the colors. Examining the biomarker table for these clusters can help us infer different types of T cell.
- Add the Biomarkers table using the Table option in the New plot menu, you can drag and reposition the table using the button in the top left corner of the plot Image Added.
- Click and drag the bar between the UMAP plot and the biomarker table to resize the biomarker table to see more of it (Figure 17)
If you need to create more space on the canvas, hide the panel words on the left using the arrow Image Modified.
Numbered figure captions |
---|
SubtitleText | Resize plots to see more of the biomarker table |
---|
AnchorName | CITE-Seq biomarker table |
---|
|
Image RemovedImage Added |
Cluster 6 has several interesting biomarkers. The top biomarker is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Another biomarker is the PD-1 protein, which is expressed in Tfh cells. This protein promotes self-tolerance and is a target for immunotherapy drugs. The TIGIT protein is also expressed in cluster 6 and is another immunotherapy drug target that promotes self-tolerance.
...
Numbered figure captions |
---|
SubtitleText | Click and drag the gene from the biomarker table onto the plot |
---|
AnchorName | Color cells by CXCL13 |
---|
|
Image RemovedImage Added |
- Click and drag the NKG7 gene from the biomarker table onto the duplicate UMAP plot
- Drop the NKG7 gene onto the Red (feature) option
...
Numbered figure captions |
---|
SubtitleText | The cells in the UMAP plot on the right are colored by their expression of CXCL13 (green) and NKG7 (red) marker genes. These cells belong to graph-based clusters 6 and 4, respectively, shown in the plot on the left |
---|
AnchorName | UMAP colored by CXCL13 and NKG7, respectively |
---|
|
Image RemovedImage Added |
- In Select & Filter, click Image RemovedImage Added to remove the CD3_TotalSeqB filtering rule
- Click the blue circle next to the Add criteria drop-down list
- Search for Graph to search for a data source
- Select Graph-based clustering (derived from the Merged counts > PCA data nodes)
- Click the Add criteria drop-down list and choose Graph-based to add a selection rule (Figure 20)
...
Numbered figure captions |
---|
SubtitleText | Change the data source to Graph-based clustering and choose Graph-based from the drop-down list |
---|
AnchorName | Selection card graph-based |
---|
|
Image RemovedImage Added |
- In the Graph-based filtering rule, click All to deselect all cells
- Click cluster 6 to select all cells in cluster 6
- Using the Classify tool, click Classify selection
- Label the cells as Tfh cells (Figure 21)
- Click Save
...
Numbered figure captions |
---|
SubtitleText | Select all cluster 6 cells and classify them as Tfh cells |
---|
AnchorName | Classify Tfh cells |
---|
|
Image RemovedImage Added |
- Click in Select & Filter to exclude the cluster 6/Tfh cells
- Click cluster 4 to select all cells in cluster 4
- In the Classify icon, click Classify selection
- Label the cells as Cytotoxic cells
- Click Save
- Click in Select & Filter to exclude the cluster 4/Cytotoxic cells
...
Numbered figure captions |
---|
SubtitleText | Color by New classifications (T cell subsets) |
---|
AnchorName | Classified T cells |
---|
|
Image RemovedImage Added |
B cells
In addition to T-cells, we would expect to see B lymphocytes, at least some of which are malignant, in a MALT tumor sample. We can color the plot by expression of a B cell marker to locate these cells on the UMAP plot.
...
Numbered figure captions |
---|
SubtitleText | Cells in UMAP plot colored by their expression of CD19 protein |
---|
AnchorName | Cells colored by CD19 |
---|
|
Image RemovedImage Added |
- Click in the top right corner of the UMAP plot
- Lasso around the CD19 positive cells (Figure 25)
- Click in Select & Filter to include the selected points
...
Numbered figure captions |
---|
SubtitleText | Lasso around CD19 positive cells |
---|
AnchorName | Selected CD19 positive cells |
---|
|
Image RemovedImage Added |
The plots will rescale to include the selected points. The CD19 positive cells include cells from graph-based clusters 1, 2 , 5, 6, 7, 8, 9, and 10 and 7 (Figure 26).
Numbered figure captions |
---|
SubtitleText | Filtered CD19 positive cells |
---|
AnchorName | Filtered CD19 positive cells |
---|
|
Image Removed |
Inspection of the biomarker table shows that clusters 6 and 7 both show signs of expressing T cell markers (e.g. CD3D and IL7R genes, and CD3 protein) and we have seen previously that these clusters likely correspond the T cells.
Image Added |
- Find the CD3_TotalSeqB protein marker in the biomarker table
- Click and drag the CD3_TotalSeqB onto the UMAP plot on the right
- Drop the CD3_TotalSeqB protein marker onto the Color configuration option on the plot (Figure 27)
...
Numbered figure captions |
---|
SubtitleText | Some cells within the CD19 positive clusters show signs of expressing T-cells markers |
---|
AnchorName | Color cells by CD3 |
---|
|
Image RemovedImage Added |
- Select either of the UMAP plots
- Click on the Select & Filter
- Click to select cluster 6 and 7
- Click the Classify icon then click Classify selection
- Label the cells as Doublets
- Click Save
- Click Image Removed in Select & Filter to exclude the selected points
There still appear to be some CD3 positive cells left on the plot, even after clusters 6 and 7 have been excluded.
- Click Image Removed to remove the Graph-based selection rule from Select & Filter
- Find the CD3_TotalSeqB protein marker in the biomarker table
- Click and drag CD3_TotalSeqB onto the Add criteria drop-down list in Select & Filter (Figure 28)
- Set the minimum threshold to 3 in the CD3_TotalSeqB selection (Figure 29)
- Click the Classify icon then click Classify selection
- Label the cells as Doublets
- Click Save
- Click Image Added in Select & Filter to exclude the selected points
Numbered figure captions |
---|
SubtitleText | Click and drag the CD3 protein marker directly onto the Add criteria drop-down list to create a selection criteria |
---|
AnchorName | Create CD3 selection rule |
---|
|
Image Removed |
- Set the minimum threshold to 2 in the CD3_TotalSeqB selection (Figure 29)
Image Added |
Numbered figure captions |
---|
SubtitleText | Select the remaining CD3 positive doublet cells |
---|
AnchorName | Select remaining CD3 positive cells |
---|
|
Image Removed |
- In the Classify icon, click Classify selection
- Choose Doublets from the drop-down list of cell labels
- Click Save
- Click Image Removed in the Select & Filter icon to exclude the selected points
The remaining CD3 positive cells have been added to the Doublet classification and removed from the plot.
...
Image Added |
The biomarkers for clusters 1 and 2 also show an interesting pattern. Cluster 1 lists IGHD as its top biomarker, while cluster 2 lists IGHA1 as the fourth most significant. Both IGHD (Immunoglobulin Heavy Constant Delta) and IGHA1 (Immunoglobulin Heavy Constant Alpha 1) encode classes of the immunoglobulin heavy chain constant region. IGHD is part of IgD, which is expressed by mature B cells, and IGHA1 is part of IgA1, which is expressed by activated B cells. We can color the plot by both of these genes to visualize their expression.
...
Numbered figure captions |
---|
SubtitleText | The B cells colored by IGHD (green) and IGHA1 (red) gene expression |
---|
AnchorName | Color B cells by 2 marker genes |
---|
|
Image RemovedImage Added |
We can use the lasso tool to select and classify these populations.
...
Numbered figure captions |
---|
SubtitleText | Lasso around the IGHD positive cells |
---|
AnchorName | Select IGHD positive cells |
---|
|
Image RemovedImage Added |
- Lasso around the IGHA1 positive cells (Figure 32)
- In the Classify icon on the left, click Classify selection
- Label the cells as Activated B cells
- Click Save
...
Numbered figure captions |
---|
SubtitleText | Select IGHA1 positive cells |
---|
AnchorName | Select IGHA1 positive cells |
---|
|
Image RemovedImage Added |
We can now visualize our classifications.
...
Numbered figure captions |
---|
SubtitleText | UMAP with cells colored by cell types |
---|
AnchorName | Classified cells |
---|
|
Image RemovedImage Added |
- Click Apply classifications in the Classify icon
- Choose the Merged counts data node as input for the classification task (Figure 34)
Numbered figure captions |
---|
SubtitleText | In the pipeline preview, select the Merged counts dta node as input for the classification task |
---|
AnchorName | Input for classification task |
---|
|
Image Removed |
- Click Select
- Name the attribute Cell type
- Click Run
- Click OK to close the message about a classification task being enqueued
Optionally, you may wish to save this data viewer session if you need to go back and reclassify cells later. To save the session, click the icon on the left and name the session.
A Classify task will be added to the pipeline producing a Classify results data node.
- Click the project name at the top to go back to the Analyses tab (Figure 35)
Numbered figure captions |
---|
SubtitleText | Pipeline after Classification of B and T cell sub-types |
---|
AnchorName | Pipeline after classification |
---|
|
Image Removed |
...