Page History
...
The Single cell counts data node contains two different types of data, mRNA measurements expression and protein measurementsexpression. So that we can process these two different types of data separately, we will split the data by data type.
- Click the Single cell counts data node
- Click the Pre-analysis tools section of in the toolbox
- Click Split matrix
...
An important step in analyzing single cell RNA-Seq data is to filter out low-quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed. In a CITE-Seq experiment, protein aggregation in the antibody staining reagents can cause a cell to have a very high number of counts; these are low-quality cells are that can be excluded. Additionally, if all cells in a data set are expected to show a baseline level of expression for one of the antibodies used, it may be appropriate to filter out cells with very low counts. You can do this in Partek Flow using the Single cell QA/QC task.
...
- Click the Antibody Capture data node
- Click the QA/QC section in in the toolbox
- Click Single Cell QA/QC
- Choose the assembly and annotation used for the gene expression data (Figure 3) from the drop-down menus
- Click Finish
...
- Set the Counts filter to Keep cells between 1500 and 15000
- Set the Detected genes filter to Keep cells between 400 and 4000
- Set the Mitochondrial counts filter to Keep cells between 0% and 20% (Figure 8)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
We will start with the protein data. We will normalize this data using Centered log-ratio (CLR). CLR was used to normalize antibody capture protein counts data in the paper that introudced introduced CITE-Seq (Stoeckius et al. 2017) and in subsequent publications on similar assays (Stoeckiius Stoeckius et al. 2018, Mimitou et al. 2018). CLR normalization includes the following steps: Add 1, Divide by Geometric mean, Add 1, log base e.
- Click the Filtered single cell counts data node produced by filtering the Antibody Capture data node
- Click the Normalization and scaling section in the toolbox
- Click Normalization
- Click the green plus next to CLR or drag CLR to the right-hand panel
- Click Finish to run (Figure 10)
...
- Click the Filtered single cell counts data node produced by filtering the Gene Expression data node
- Click the Normalization and scaling section in the toolbox
- Click Normalization
- Click the button
- Change the log base from 2 to e
- Click Finish to run (Figure 11)
...
- Click the Normalized counts data node on the Antibody Capture branch of the pipeline
- Click the Single cell counts data nodeClick the Pre-analysis tools section of of the toolbox
- Click Merge matrices
- Click Select data node to launch the data node selector
...
Tasks that can for the beginning and end of the collapsed section of the pipeline are highlighted in purple (Figure 16). We have chosen the Split matrix task as the start and we can choose Merge matrices as the end of the collapsed section.
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Name the Collapsed task Data processing
- Click Save (Figure 17)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Double-click the Data processing title bar to re-collapse (Figure 18)collapse
Classify cells using Scatter plot
An alternative method to clustering and UMAP/t-SNE for classifying cells is using a scatter plot to visualize the Now that we have our mRNA and protein data filtered and normalized, we can proceed to identify our cell types. The simplest way to do this is classifying cell types based on their expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than with gene expression data alone as the protein expression data has a better dynamic range and is less sparse. Additionally, many cell types have expected cell surface marker profiles established using other technologies such as flow cytometry or CyTOF. To use this strategy, we can use a basic scatter plot.
- Click the Merged counts data node
- Click Exploratory analysis in the toolbox
- Click Scatter plot
- Click Finish to run
- Double-click the Scatter plot task node to open it
- Click 2D to switch to a 2D plot style (Figure 4419)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Similar to the t-SNE or UMAP scatter plots, each Each point on the plot is a single cell. The axes are set to features (gene or protein) in the data set by default, but can be set to any attribute or feature. On this plot, we can see that CD3_TotalSeqB is on the x-axis and CD4_TotalSeqB is on the y-axis. We can use our selection and filtering tools to perform a basic classification of CD4 and CD8 T cells.
- Click the Features tab in the Selection / Filtering section of the control panel
- Type CD3 in the ID search bar of the Features tab
- Click CD3_TotalSeqB in the drop-down (Figure 4520)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
This will select any cell with <= 2 normalized count for CD3 protein. Selected cells are shown in bold on the plot and, because we have CD3_TotalSeqB on one of our axes, the cutoff cut-off point chosen can be easily evaluated (Figure 4621).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
The x-axis now shows CD8a protein expression (Figure 4722).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
This will select the cells in the upper left-hand section of the plot (Figure 4823).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Change CD4_TotalSeqB filter to < 1.5
- Change CD8a_TotalSeqB filter to >= 2
This selects the cells in the lower right-hand section of the plot (Figure 4924).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Click Clear selection
- Choose Classifications from the Color by drop-down menu (Figure 5025).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this later in a subsequent section of the tutorial.
Dimensional reduction and clustering with protein expression data
...
If there are fewer than 50 proteins in the data set, all possible PCs (number of proteins - 1) will be used by default and, because using all the PCs will capture all of the variance in the data set, this is equivalent to running clustering on the original data. If you data set has more than 50 proteins and you want to run clustering on full data instead of a subset of PCs, simply set the number of PCs to All in the advanced settings. We will discuss how to pick an optimal number of PCs for data with larger numbers of features, like gene expression data, later in a subsequent section of the tutorial.
Once Graph-based clustering has finished running and produced a Clustering result data node, we can visualize the results using UMAP or t-SNE. Both are dimensional reduction techniques that place cells with similar expression close together. An advantage of UMAP over t-SNE is that is preserves more of the global structure of the data. This means that with UMAP, more similar clusters are closer together while dissimilar clusters are further apart. With t-SNE, the relative positions of clusters to each other are often uninformative.
...
Each point on the plot is a cell and the cells are colored by their cluster assignments (Figure 3926).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Choose Expression from the Color by drop-down menu
- Type CD4 in the search box and choose CD4_TotalSeqB from the drop-down (Figure 2327)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Cells that express high levels of CD4 are colored blue on the plot (Figure 28).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Click to activate the lasso tool
- Draw a lasso around the large blue group of cells at the bottom right of the plot to select them (Figure 29)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Choose Graph-based from the Color by drop-down menu (Figure 30)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Again, the colors here indicate the cluster assignment for each cell. Because we ran clustering using only the protein expression data, the cluster assignments are based on each cells protein expression data. To help identify which cell types the clusters correspond to, we generate a group biomarkers table with every clustering result. Biomarkers are genes or proteins that are expressed highly in a clusters when compared with the other clusters. Please note that while While the clustering was calculated using only the protein expression data, the biomarkers are drawn from both gene and protein expression data.
...
PD-1 expression is highest in cluster 8 with uniformly strong high expression throughout the cluster (Figure 31).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
It is interesting to note that this pattern of PD-1 expression is not easily discernible at the PD-1 PDCD1 gene expression level (PDCD1) (Figure 32).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
The Tfh cell marker, CXCL13, is highly and specifically expressed in cluster 8 (Figure 33), so we will classify the cells from cluster 8 as Tfh cells.
...
- Choose Graph-based from the Color by drop-down menu
- Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel (Figure 34)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
- Click the check box for 8 to select cluster 8 (Figure 35)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Choose Classifications from the Color by drop-down menu (Figure 36)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Clustering and dimensional reduction with gene expression data
...
Choosing the number of PCs
In As we noted before, in this data set, we have two data types. We can choose to run analysis tasks on one or both of the data types. Here, we will run PCA on only the mRNA data to find the optimal number of PCs for the mRNA data.
...
- Click Apply
- Click Finish to run (Figure 1537)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering and t-SNEUMAP.
- Mouse over the Scree plot to identify the point where additional PCs offer little additional information (Figure 1638)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Click the Merged counts data node
- Click Exploratory analysis in the toolbox
- Click Graph-based clustering
- Click Gene Expression for Include features where "Feature type" is
- Click Configure to access the advanced settings
- Set Number of principal components to 15
- Click Apply
- Click Finish to run (Figure 1739)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Click the Clustering result data node
- Click Exploratory analysis in the toolbox
- Click UMAP
- Click Gene Expression for Include features where "Feature type" is
- Click Configure to access the advanced settings
- Set Number of principal components to 15
- Click Apply
- Click Finish to run (Figure 1840)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
The UMAP task report includes a scatter plot with the clustering results coloring the points (Figure 1941).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
- Click the 2D radio button for Plot style to switch to the 2D UMAP (Figure 2042)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Click to activate the lasso tool
- Draw a lasso around clusters 3, 4, and 6 (Figure 2143) to select them
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
- Click to filter to include only the selected cells
- Click 2244)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Choose Expression from the Color by drop-down menu
- Type NKG7 in the search box and choose NKG7 from the drop-down (Figure 2345)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
- Click to color by a second feature (gene or protein)
- Type CD4 and choose CD4_TotalSeqB from the drop-down (Figure 2446)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
This will color the plot by NKG7 gene expression, CD4 protein expression, and CD3 protein expression. Each feature gets a color channel, green, red, or blue. Cells without expression are black and the . The mix of green, red, and blue coloring each cell is determined by the relative expression of the three genes. Cells expressing both CD4 protein (red) and CD3 protein (blue), but not NKG7 (green) are purple, while cells expressing both NKG7 (green) and CD3 protein (blue) are teal (Figure 2547). CD3 is a pan-T cells marker, which helps confirm indicates that this group of clusters is composed of T cells.
...
By default, any cell that expresses >= 1 normalized count of NKG7 is now selected (Figure 2648).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
We have now selected only cells that express >= 1 normalized count for NKG7 gene and CD3 protein, but also have <= 2 normalized count for CD4 protein (Figure 2749).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
We have now selected the CD4 positive, CD3 positive, NKG7 negative helper T cells (Figure 2850).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Click Clear selection
- Select Classifications from the Color by drop-down menu (Figure 2951)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
The zoom level will also be reset (Figure 3052).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
There are several clusters that show high levels of CD19 protein expression (Figure 3153). We can filter to these cells to examine them more closely.
...
- Click to activate the lasso tool
- Draw a lasso around the CD19 protein-expressing clusters to select them
- Click to filter to include only the selected cells
- Click 3254)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Choose Graph-based from the Color by drop-down menu menu (Figure 55)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel
- Click the check box for 7 to select cluster 7
- Click Classify selection (Figure 56)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
This will color the plot by IGHD and IGHA1 (Figure 3557).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Select the left-hand cluster with IGHA1 expression (Figure 3658)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Draw a lasso around the right-hand cluster (Figure 3759)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Select Classifications from the Color by drop-down menu
- Click Clear filters to view all cells (Figure 3860)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Click the Classified groups data node
- Click Filtering
- Click Filter groups
- Set to exclude Classifications is Doublets using the drop-down menus
- Click AND
- Set the second filter to exclude Classifications is N/A using the drop-down menus
- Click Finish to apply the filter (Figure 61)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
This produces a Filtered groups data node (Figure 62).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
This will produce two data nodes, one for each data type (Figure 63).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Click Finish to run the statistical test (Figure 64)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
The report lists each feature tested, giving p-value, false discovery rate adjusted p-value (FDR step up), and fold change values for each comparison (Figure 65).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
This opens a violin plot showing CD25 expression for cells in each of the classifications (Figure 66).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Click the Feature list data node
- Click Exploratory analysis in the toolbox
- Click Hierarchical clustering
- Click Finish to run with default settings
- Double-click the Hierarchical clustering task node to open the heat map (Figure 67)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
This generates a customized heat map to illustrate how the cell types differ in their protein expression (Figure 68).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
- Double-click the GSA task node to open the task report (Figure 69)
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
Each gene is shown as a point on the plot with cut-off lines for fold change and p-value or FDR step up set using the control panel on the left (Figure 70). The number of genes up and down regulated according to the cut-offs is listed at the bottom of the plot. Mousing over a point shows the gene name and other information.
...
The number at the top of the filter will update to show the number of included genes (Figure 71).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
|
...
The pathway enrichment results list KEGG pathways, giving an enrichment score and p-value for each (Figure 72).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
The KEGG pathway map shows up-regulated genes from the input list in red and down-regulated genes from the input list in green (Figure 73).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...