Page History

...

The Single cell counts data node contains two different types of data, mRNA measurements expression and protein measurementsexpression. So that we can process these two different types of data separately, we will split the data by data type.

Click the Single cell counts data node
Click the Pre-analysis tools section of in the toolbox
Click Split matrix

...

An important step in analyzing single cell RNA-Seq data is to filter out low-quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed. In a CITE-Seq experiment, protein aggregation in the antibody staining reagents can cause a cell to have a very high number of counts; these are low-quality cells are that can be excluded. Additionally, if all cells in a data set are expected to show a baseline level of expression for one of the antibodies used, it may be appropriate to filter out cells with very low counts. You can do this in Partek Flow using the Single cell QA/QC task.

...

Click the Antibody Capture data node
Click the QA/QC section in in the toolbox
Click Single Cell QA/QC
Choose the assembly and annotation used for the gene expression data (Figure 3) from the drop-down menus
Click Finish

...

Set the Counts filter to Keep cells between 1500 and 15000
Set the Detected genes filter to Keep cells between 400 and 4000
Set the Mitochondrial counts filter to Keep cells between 0% and 20% (Figure 8)

Numbered figure captions

SubtitleText	Filtering low-quality cells by gene expression data
AnchorName	Filtering cells by mRNA data

...

We will start with the protein data. We will normalize this data using Centered log-ratio (CLR). CLR was used to normalize antibody capture protein counts data in the paper that introudced introduced CITE-Seq (Stoeckius et al. 2017) and in subsequent publications on similar assays (Stoeckiius Stoeckius et al. 2018, Mimitou et al. 2018). CLR normalization includes the following steps: Add 1, Divide by Geometric mean, Add 1, log base e.

Click the Filtered single cell counts data node produced by filtering the Antibody Capture data node
Click the Normalization and scaling section in the toolbox
Click Normalization
Click the green plus next to CLR or drag CLR to the right-hand panel
Click Finish to run (Figure 10)

...

Click the Filtered single cell counts data node produced by filtering the Gene Expression data node
Click the Normalization and scaling section in the toolbox
Click Normalization
Click the Image Modified button
Change the log base from 2 to e
Click Finish to run (Figure 11)

...

Click the Normalized counts data node on the Antibody Capture branch of the pipeline
Click the Single cell counts data nodeClick the Pre-analysis tools section of of the toolbox
Click Merge matrices
Click Select data node to launch the data node selector

...

Tasks that can for the beginning and end of the collapsed section of the pipeline are highlighted in purple (Figure 16). We have chosen the Split matrix task as the start and we can choose Merge matrices as the end of the collapsed section.

Numbered figure captions

SubtitleText	Tasks that can be the start or end of a collapsed task are shown in purple
AnchorName	Viewing options for collapsing

...

Name the Collapsed task Data processing
Click Save (Figure 17)

Numbered figure captions

SubtitleText	Naming the collapsed task
AnchorName	Naming the collapsed task

The new collapsed task, Data processing, appears as a single rectangle on the task graph (Figure 18).

...

Double-click the Data processing title bar to re-collapse (Figure 18)collapse

Classify cells using Scatter plot

An alternative method to clustering and UMAP/t-SNE for classifying cells is using a scatter plot to visualize the Now that we have our mRNA and protein data filtered and normalized, we can proceed to identify our cell types. The simplest way to do this is classifying cell types based on their expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than with gene expression data alone as the protein expression data has a better dynamic range and is less sparse. Additionally, many cell types have expected cell surface marker profiles established using other technologies such as flow cytometry or CyTOF. To use this strategy, we can use a basic scatter plot.

Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click Scatter plot
Click Finish to run
Double-click the Scatter plot task node to open it
Click 2D to switch to a 2D plot style (Figure 4419)

Numbered figure captions

SubtitleText	Viewing the 2D scatter plot
AnchorName	2D scatter plot

Similar to the t-SNE or UMAP scatter plots, each Each point on the plot is a single cell. The axes are set to features (gene or protein) in the data set by default, but can be set to any attribute or feature. On this plot, we can see that CD3_TotalSeqB is on the x-axis and CD4_TotalSeqB is on the y-axis. We can use our selection and filtering tools to perform a basic classification of CD4 and CD8 T cells.

Click the Features tab in the Selection / Filtering section of the control panel
Type CD3 in the ID search bar of the Features tab
Click CD3_TotalSeqB in the drop-down (Figure 4520)

Numbered figure captions

SubtitleText	Filtering by values for a feature
AnchorName	Filtering by a feature

...

This will select any cell with <= 2 normalized count for CD3 protein. Selected cells are shown in bold on the plot and, because we have CD3_TotalSeqB on one of our axes, the cutoff cut-off point chosen can be easily evaluated (Figure 4621).

Numbered figure captions

SubtitleText	CD3+ cells are selected and shown in bold on the plot
AnchorName	CD3+ cells are selected

...

The x-axis now shows CD8a protein expression (Figure 4722).

Numbered figure captions

SubtitleText	Switching axes on the scatter plot
AnchorName	Switching axes

...

This will select the cells in the upper left-hand section of the plot (Figure 4823).

Numbered figure captions

SubtitleText	Selecting CD3+ CD4+ CD8- cells
AnchorName	Selecting cells on the scatter plot

...

Change CD4_TotalSeqB filter to < 1.5
Change CD8a_TotalSeqB filter to >= 2

This selects the cells in the lower right-hand section of the plot (Figure 4924).

Numbered figure captions

SubtitleText	Selecting CD3+ CD4- CD8+ cells
AnchorName	Selecting CD8 T cells

...

Click Clear selection
Choose Classifications from the Color by drop-down menu (Figure 5025).

Numbered figure captions

SubtitleText	Classified CD4 and CD8 T cells
AnchorName	Classified cells

...

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this later in a subsequent section of the tutorial.

Dimensional reduction and clustering with protein expression data

...

If there are fewer than 50 proteins in the data set, all possible PCs (number of proteins - 1) will be used by default and, because using all the PCs will capture all of the variance in the data set, this is equivalent to running clustering on the original data. If you data set has more than 50 proteins and you want to run clustering on full data instead of a subset of PCs, simply set the number of PCs to All in the advanced settings. We will discuss how to pick an optimal number of PCs for data with larger numbers of features, like gene expression data, later in a subsequent section of the tutorial.

Once Graph-based clustering has finished running and produced a Clustering result data node, we can visualize the results using UMAP or t-SNE. Both are dimensional reduction techniques that place cells with similar expression close together. An advantage of UMAP over t-SNE is that is preserves more of the global structure of the data. This means that with UMAP, more similar clusters are closer together while dissimilar clusters are further apart. With t-SNE, the relative positions of clusters to each other are often uninformative.

...

Each point on the plot is a cell and the cells are colored by their cluster assignments (Figure 3926).

Numbered figure captions

SubtitleText	UMAP from protein expression data
AnchorName	UMAP on protein expression

...

Choose Expression from the Color by drop-down menu
Type CD4 in the search box and choose CD4_TotalSeqB from the drop-down (Figure 2327)

Numbered figure captions

SubtitleText	Coloring by expression
AnchorName	Coloring by a feature

Cells that express high levels of CD4 are colored blue on the plot (Figure 28).

Numbered figure captions

SubtitleText	Coloring by CD4 protein expression
AnchorName	Coloring by protein expression

...

Click to activate the lasso tool
Draw a lasso around the large blue group of cells at the bottom right of the plot to select them (Figure 29)

Numbered figure captions

SubtitleText	Selecting the CD4 cluster
AnchorName	Selecting CD4 cells

...

Choose Graph-based from the Color by drop-down menu (Figure 30)

Numbered figure captions

SubtitleText	Protein-based clustering results
AnchorName	Clustering results from protein data

Again, the colors here indicate the cluster assignment for each cell. Because we ran clustering using only the protein expression data, the cluster assignments are based on each cells protein expression data. To help identify which cell types the clusters correspond to, we generate a group biomarkers table with every clustering result. Biomarkers are genes or proteins that are expressed highly in a clusters when compared with the other clusters. Please note that while While the clustering was calculated using only the protein expression data, the biomarkers are drawn from both gene and protein expression data.

...

PD-1 expression is highest in cluster 8 with uniformly strong high expression throughout the cluster (Figure 31).

Numbered figure captions

SubtitleText	PD-1 expression in helper T cells
AnchorName	PD-1 expression

...

It is interesting to note that this pattern of PD-1 expression is not easily discernible at the PD-1 PDCD1 gene expression level (PDCD1) (Figure 32).

Numbered figure captions

SubtitleText	PDCD1 (PD-1) gene expression does not form a clear pattern
AnchorName	PD-1 gene expression

...

The Tfh cell marker, CXCL13, is highly and specifically expressed in cluster 8 (Figure 33), so we will classify the cells from cluster 8 as Tfh cells.

...

Choose Graph-based from the Color by drop-down menu
Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel (Figure 34)

Numbered figure captions

SubtitleText	Choosing to select by cluster
AnchorName	Selecting a cluster

Click the check box for 8 to select cluster 8 (Figure 35)

Numbered figure captions

SubtitleText	Selecting a cluster
AnchorName	Selecting a cluster

...

Choose Classifications from the Color by drop-down menu (Figure 36)

Numbered figure captions

SubtitleText	Coloring by classification
AnchorName	Coloring by classification

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this later in a subsequent section of the tutorial.

Clustering and dimensional reduction with gene expression data

...

Choosing the number of PCs

In As we noted before, in this data set, we have two data types. We can choose to run analysis tasks on one or both of the data types. Here, we will run PCA on only the mRNA data to find the optimal number of PCs for the mRNA data.

...

Click Apply
Click Finish to run (Figure 1537)

Numbered figure captions

SubtitleText	Configuring PCA to run on the Gene Expression data
AnchorName	Configuring PCA

...

The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering and t-SNEUMAP.

Mouse over the Scree plot to identify the point where additional PCs offer little additional information (Figure 1638)

Numbered figure captions

SubtitleText	Identifying an optimal number of PCs
AnchorName	Scree plot for Gene Expression data

...

Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click Graph-based clustering
Click Gene Expression for Include features where "Feature type" is
Click Configure to access the advanced settings
Set Number of principal components to 15
Click Apply
Click Finish to run (Figure 1739)

Numbered figure captions

SubtitleText	Running Graph-based clustering on the Gene Expression data
AnchorName	Graph-based clustering configuration

...

Click the Clustering result data node
Click Exploratory analysis in the toolbox
Click UMAP
Click Gene Expression for Include features where "Feature type" is
Click Configure to access the advanced settings
Set Number of principal components to 15
Click Apply
Click Finish to run (Figure 1840)

Numbered figure captions

SubtitleText	Running UMAP on the Gene Expression data
AnchorName	Running UMAP

...

The UMAP task report includes a scatter plot with the clustering results coloring the points (Figure 1941).

Numbered figure captions

SubtitleText	UMAP calculated on Gene Expression values. Colored by Graph-based clustering results.
AnchorName	UMAP results

Click the 2D radio button for Plot style to switch to the 2D UMAP (Figure 2042)

Numbered figure captions

SubtitleText	Viewing the 2D UMAP
AnchorName	2D UMAP

...

Click to activate the lasso tool
Draw a lasso around clusters 3, 4, and 6 (Figure 2143) to select them

Numbered figure captions

SubtitleText	Selecting a group of clusters
AnchorName	Selecting a group of clusters

Click to filter to include only the selected cells
Click to rescale the axes to the included cells (Figure 2244)

Numbered figure captions

SubtitleText	Zooming to a group of clusters in UMAP
AnchorName	Viewing one sub-clustering

...

Choose Expression from the Color by drop-down menu
Type NKG7 in the search box and choose NKG7 from the drop-down (Figure 2345)

Numbered figure captions

SubtitleText	Coloring by NKG7 expression
AnchorName	Coloring by a gene

This will color the plot by NKG7 gene expression, a marker for cytotoxic cells. We can color by two T cell protein markers to distinguish cytotoxic T cells from helper T cells.

Click to color by a second feature (gene or protein)
Type CD4 and choose CD4_TotalSeqB from the drop-down (Figure 2446)

Numbered figure captions

SubtitleText	Coloring by a second feature
AnchorName	Coloring by a second feature

...

This will color the plot by NKG7 gene expression, CD4 protein expression, and CD3 protein expression. Each feature gets a color channel, green, red, or blue. Cells without expression are black and the . The mix of green, red, and blue coloring each cell is determined by the relative expression of the three genes. Cells expressing both CD4 protein (red) and CD3 protein (blue), but not NKG7 (green) are purple, while cells expressing both NKG7 (green) and CD3 protein (blue) are teal (Figure 2547). CD3 is a pan-T cells marker, which helps confirm indicates that this group of clusters is composed of T cells.

...

By default, any cell that expresses >= 1 normalized count of NKG7 is now selected (Figure 2648).

Numbered figure captions

SubtitleText	Selecting by NKG7 expression
AnchorName	Selecting by NKG7

...

We have now selected only cells that express >= 1 normalized count for NKG7 gene and CD3 protein, but also have <= 2 normalized count for CD4 protein (Figure 2749).

Numbered figure captions

SubtitleText	Filtering using multiple genes and proteins
AnchorName	Filtering using multiple genes and proteins

...

We have now selected the CD4 positive, CD3 positive, NKG7 negative helper T cells (Figure 2850).

Numbered figure captions

SubtitleText	Modifying the selection criteria lets us select helper T cells
AnchorName	Selecting helper T cells

...

Click Clear selection
Select Classifications from the Color by drop-down menu (Figure 2951)

Numbered figure captions

SubtitleText	Viewing cytotoxic and helper T cell classifications
AnchorName	Classified cells

...

The zoom level will also be reset (Figure 3052).

Numbered figure captions

SubtitleText	Resetting filters also resets the zoom level
AnchorName	Reset zoom to show UMAP

...

There are several clusters that show high levels of CD19 protein expression (Figure 3153). We can filter to these cells to examine them more closely.

...

Click to activate the lasso tool
Draw a lasso around the CD19 protein-expressing clusters to select them
Click to filter to include only the selected cells
Click to rescale the axes to the included cells (Figure 3254)

Numbered figure captions

SubtitleText	Filtering to CD19 expressing clusters
AnchorName	Filtering to B lymphocytes

...

Choose Graph-based from the Color by drop-down menu menu (Figure 55)

Numbered figure captions

SubtitleText	Viewing B lymphocyte clusters
AnchorName	Viewing B lymphocyte clusters

...

Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel
Click the check box for 7 to select cluster 7
Click Classify selection (Figure 56)

Numbered figure captions

SubtitleText	Selected cluster 7, a group of potential doublets
AnchorName	Selected cluster 7

...

This will color the plot by IGHD and IGHA1 (Figure 3557).

Numbered figure captions

SubtitleText	Coloring by two genes from the Group biomarkers table
AnchorName	Coloring by two biomarkers

...

Select the left-hand cluster with IGHA1 expression (Figure 3658)

Numbered figure captions

SubtitleText	Selecting the IGHA1+ cells
AnchorName	Selecting the plasma cells

...

Draw a lasso around the right-hand cluster (Figure 3759)

Numbered figure captions

SubtitleText	Selecting the IGHD+ mature B cells
AnchorName	Selecting the IGHD+ cells

...

Select Classifications from the Color by drop-down menu
Click Clear filters to view all cells (Figure 3860)

Numbered figure captions

SubtitleText	Viewing classifications
AnchorName	Viewing all cells

...

Click the Classified groups data node
Click Filtering
Click Filter groups
Set to exclude Classifications is Doublets using the drop-down menus
Click AND
Set the second filter to exclude Classifications is N/A using the drop-down menus
Click Finish to apply the filter (Figure 61)

Numbered figure captions

SubtitleText	Filtering groups to exclude cell types
AnchorName	Filtering groups

This produces a Filtered groups data node (Figure 62).

Numbered figure captions

SubtitleText	Filter groups output
AnchorName	FIlter groups output

...

This will produce two data nodes, one for each data type (Figure 63).

Numbered figure captions

SubtitleText	Split matrix can also re-split the data
AnchorName	Split matrix

...

Click Finish to run the statistical test (Figure 64)

Numbered figure captions

SubtitleText	Setting up a comparison in the GSA task
AnchorName	Adding comparisons

...

The report lists each feature tested, giving p-value, false discovery rate adjusted p-value (FDR step up), and fold change values for each comparison (Figure 65).

Numbered figure captions

SubtitleText	GSA report for the protein expression data
AnchorName	GSA report

...

This opens a violin plot showing CD25 expression for cells in each of the classifications (Figure 66).

Numbered figure captions

SubtitleText	Violin plot showing CD25 protein expression
AnchorName	Violin plot

...

Click the Feature list data node
Click Exploratory analysis in the toolbox
Click Hierarchical clustering
Click Finish to run with default settings
Double-click the Hierarchical clustering task node to open the heat map (Figure 67)

Numbered figure captions

SubtitleText	Heat map prior to customization
AnchorName	Heat map prior to configuration

...

This generates a customized heat map to illustrate how the cell types differ in their protein expression (Figure 68).

Numbered figure captions

SubtitleText	Customized heat map illustrating protein expression differences between cell types
AnchorName	MALT heat map

...

Double-click the GSA task node to open the task report (Figure 69)

Numbered figure captions

SubtitleText	Results of differential gene expression analysis
AnchorName	Gene expression analysis results

...

Each gene is shown as a point on the plot with cut-off lines for fold change and p-value or FDR step up set using the control panel on the left (Figure 70). The number of genes up and down regulated according to the cut-offs is listed at the bottom of the plot. Mousing over a point shows the gene name and other information.

...

The number at the top of the filter will update to show the number of included genes (Figure 71).

Numbered figure captions

SubtitleText	Filtering GSA results to significant genes
AnchorName	Filtered GSA results

...

The pathway enrichment results list KEGG pathways, giving an enrichment score and p-value for each (Figure 72).

Numbered figure captions

SubtitleText	Pathway enrichment task report
AnchorName	Pathway enrichment task report

To get a better idea about the changes in each enriched pathway, we can view an interactive KEGG pathway map.

...

The KEGG pathway map shows up-regulated genes from the input list in red and down-regulated genes from the input list in green (Figure 73).

Numbered figure captions

SubtitleText	Interactive KEGG pathway map for FoxO signaling pathway
AnchorName	FoxO signaling pathway

...

Partek Flow Documentation

Page tree

Versions Compared

Old Version 24

New Version 25

Key

Classify cells using Scatter plot

Dimensional reduction and clustering with protein expression data

Clustering and dimensional reduction with gene expression data

Choosing the number of PCs