Page History

Table of Contents

This tutorial presents an outline of the basic series of steps for analyzing a 10x Genomics Gene Expression with Feature Barcoding (antibody) data set in Partek Flow starting with the output of Cell Ranger.

...

A rectangle, or task node, will be created for Split matrix along with two output circles, or data nodes, one for each data type (Figure 2). The labels for these data types are determined by features.csv file used when processing the data with Cell Ranger. Here, our data is labeled Gene Expression, for the mRNA data, and Antibody Capture, for the protein data.

Numbered figure captions

SubtitleText	Split matrix produces two data nodes, one for each data type
AnchorName	Split matrix output

...

This produces a Single-cell QA/QC task node (Figure 4).

Numbered figure captions

SubtitleText	Single cell QA/QC produces a task node
AnchorName	Output of Single cell QA/QC

...

The output is a Filtered single cell counts data node (Figure 6).

Numbered figure captions

SubtitleText	Filtered cells output
AnchorName	Filtered cells by protein count

...

This produces a Single-cell QA/QC task node (Figure 7).

Numbered figure captions

SubtitleText	Single cell QA/QC produces a task node
AnchorName	Output of Single cell QA/QC (2)

...

The output is a Filtered single cell counts data node (Figure 9).

Numbered figure captions

SubtitleText	There are now two Filtered single cell counts data nodes
AnchorName	Filtering out low-quality cells

...

Normalization produces a Normalized counts data node on the Gene Expression branch of the pipeline (Figure 12).

Numbered figure captions

SubtitleText	Both Antibody Capture and Gene Expression data has been normalizied
AnchorName	Results of normalization

...

Data nodes that can be merged with the Antibody Capture branch Normalized counts data node are shown in color (Figure 13).

Numbered figure captions

SubtitleText	Choosing a data node to merge
AnchorName	Data node selector

...

The output is a Merged counts data node (Figure 14). This data node will include the normalized counts of our protein and mRNA data. The intersection of cells from the two input data nodes is retained so only cells that passed the quality filter for both protein and mRNA data will be included in the Merged counts data node.

Numbered figure captions

SubtitleText	Merging data types prior to downstream analysis
AnchorName	Merging data types

Image Modified

Collapsing tasks to simplify the pipeline

...

Tasks that can for the beginning and end of the collapsed section of the pipeline are highlighted in purple (Figure 16). We have chosen the Split matrix task as the start and we can choose Merge matrices as the end of the collapsed section.

Numbered figure captions

SubtitleText	Tasks that can be the start or end of a collapsed task are shown in purple
AnchorName	Viewing options for collapsing

...

Name the Collapsed task Data processing
Click Save (Figure 17)

Numbered figure captions

SubtitleText	Naming the collapsed task
AnchorName	Naming the collapsed task

Image Modified

The new collapsed task, Data processing, appears as a single rectangle on the task graph (Figure 18).

Numbered figure captions

SubtitleText	Collapsed tasks are represented by a single task node
AnchorName	Collapsed task

...

When expanded, the collapsed task is shown as a shaded section of the pipeline with a title bar (Figure 19).

Numbered figure captions

SubtitleText	Expanding a collapsed task to show its components
AnchorName	Expanding a collapsed task

...

Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click Scatter plot
Click Finish to run
Double-click the Scatter plot task node to open it
Click 2D to switch to a 2D plot style (Figure 1920)

Numbered figure captions

SubtitleText	Viewing the 2D scatter plot
AnchorName	2D scatter plot

...

Click the Features tab in the Selection / Filtering section of the control panel
Type CD3 in the ID search bar of the Features tab
Click CD3_TotalSeqB in the drop-down (Figure 2021)

Numbered figure captions

SubtitleText	Filtering by values for a feature
AnchorName	Filtering by a feature

Click to add a filter for CD3 protein expression
Set the CD3_TotalSeqB filter to <to = 2 > 2

This will select any cell with < => 2 normalized count for CD3 protein. Selected cells are shown in bold on the plot and, because we have CD3_TotalSeqB on one of our axes, the cut-off point chosen can be easily evaluated (Figure 2122).

Numbered figure captions

SubtitleText	CD3+ cells are selected and shown in bold on the plot
AnchorName	CD3+ cells are selected

...

The x-axis now shows CD8a protein expression (Figure 2223).

Numbered figure captions

SubtitleText	Switching axes on the scatter plot
AnchorName	Switching axes

...

Type CD4 in the ID search bar of the Features tab
Click CD4_TotalSeqB in the drop-down
Click to add a filter for CD4 protein expression
Set the CD4_TotalSeqB filter to <to = 2 > 2
Type CD8a in the ID search bar of the Features tab
Click CD8a_TotalSeqB in the drop-down
Click to add a filter for CD8a protein expression
Set the CD8a_TotalSeqB filter to < 2

This will select the cells in the upper left-hand section of the plot (Figure 2324).

Numbered figure captions

SubtitleText	Selecting CD3+ CD4+ CD8- cells
AnchorName	Selecting cells on the scatter plot

...

This selects the cells in the lower right-hand section of the plot (Figure 2425).

Numbered figure captions

SubtitleText	Selecting CD3+ CD4- CD8+ cells
AnchorName	Selecting CD8 T cells

...

Click Clear selection
Choose Classifications from the Color by drop-down menu (Figure 2526).

Numbered figure captions

SubtitleText	Classified CD4 and CD8 T cells
AnchorName	Classified cells

...

Each point on the plot is a cell and the cells are colored by their cluster assignments (Figure 2627).

Numbered figure captions

SubtitleText	UMAP from protein expression data
AnchorName	UMAP on protein expression

...

Choose Expression from the Color by drop-down menu
Type CD4 in the search box and choose CD4_TotalSeqB from the drop-down (Figure 2728)

Numbered figure captions

SubtitleText	Coloring by expression
AnchorName	Coloring by a feature

Cells that express high levels of CD4 are colored blue on the plot (Figure 2829).

Numbered figure captions

SubtitleText	Coloring by CD4 protein expression
AnchorName	Coloring by protein expression

...

Click to activate the lasso tool
Draw a lasso around the large blue group of cells at the bottom right of the plot to select them (Figure 2930)

Numbered figure captions

SubtitleText	Selecting the CD4 cluster
AnchorName	Selecting CD4 cells

Image Modified

Click to filter to include only the selected cells
Click to rescale the axes to the included cells

...

Choose Graph-based from the Color by drop-down menu (Figure 3031)

Numbered figure captions

SubtitleText	Protein-based clustering results
AnchorName	Clustering results from protein data

Again, the colors here indicate the cluster assignment for each cell. Because we ran clustering using only the protein expression data, the cluster assignments are based on each cells protein expression data. To help identify which cell types the clusters correspond to, we generate a group biomarkers table with every clustering result. Biomarkers are genes or proteins that are expressed highly in a clusters when compared with the other clusters. While the clustering was calculated using only the protein expression data, the biomarkers are drawn from both gene and protein expression data.

The far-left right cluster, cluster 8, has several interesting biomarkers. The top biomarker, is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Two of the other biomarkers are PD-1 protein, which is expressed in Tfh cells, promotes self-tolerance, and is a target for immunotherapy drugs; and TIGIT protein, another immunotherapy drug target that promotes self-tolerance.

...

PD-1 expression is highest in cluster 8 with high expression throughout the cluster (Figure 3132).

Numbered figure captions

SubtitleText	PD-1 expression in helper T cells
AnchorName	PD-1 expression

...

It is interesting to note that this pattern of PD-1 expression is not easily discernible at the PDCD1 gene expression level (Figure 3233).

Numbered figure captions

SubtitleText	PDCD1 (PD-1) gene expression does not form a clear pattern
AnchorName	PD-1 gene expression

...

The Tfh cell marker, CXCL13, is highly and specifically expressed in cluster 8 (Figure 3334), so we will classify the cells from cluster 8 as Tfh cells.

Numbered figure captions

SubtitleText	CXCL13 expression is strong in cluster 8
AnchorName	CXCL13 expression

Choose Graph-based from the Color by drop-down menu
Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel (Figure 3435)

Numbered figure captions

SubtitleText	Choosing to select by cluster
AnchorName	Selecting a cluster

Click the check box for 8 to select cluster 8 (Figure 3536)

Numbered figure captions

SubtitleText	Selecting a cluster
AnchorName	Selecting a cluster

Image Modified

Click Classify selection
Name the cells Tfh cells
Click Save

...

Choose Classifications from the Color by drop-down menu (Figure 3637)

Numbered figure captions

SubtitleText	Coloring by classification
AnchorName	Coloring by classification

Image Modified

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this later in the tutorial.

...

Click Apply
Click Finish to run (Figure 3738)

Numbered figure captions

SubtitleText	Configuring PCA to run on the Gene Expression data
AnchorName	Configuring PCA

...

Mouse over the Scree plot to identify the point where additional PCs offer little additional information (Figure 3839)

Numbered figure captions

SubtitleText	Identifying an optimal number of PCs
AnchorName	Scree plot for Gene Expression data

...

Click the Merged counts data node
Click Exploratory analysis in the toolbox
Click Graph-based clustering
Click Gene Expression for Include features where "Feature type" is
Click Configure to access the advanced settings
Set Number of principal components to 15
Click Apply
Click Finish to run (Figure 3940)

Numbered figure captions

SubtitleText	Running Graph-based clustering on the Gene Expression data
AnchorName	Graph-based clustering configuration

...

The UMAP task report includes a scatter plot with the clustering results coloring the points (Figure 41).

Numbered figure captions

SubtitleText	UMAP calculated on Gene Expression values. Colored by Graph-based clustering results.
AnchorName	UMAP results

...

Choose Expression from the Color by drop-down menu
Type NKG7 in the search box and choose NKG7 from the drop-down (Figure 45)

Numbered figure captions

SubtitleText	Coloring by NKG7 expression
AnchorName	Coloring by a gene

Image Modified

This will color the plot by NKG7 gene expression, a marker for cytotoxic cells. We can color by two T cell protein markers to distinguish cytotoxic T cells from helper T cells.

...

By default, any cell that expresses >= 1 normalized count of NKG7 is now selected (Figure 48).

Numbered figure captions

SubtitleText	Selecting by NKG7 expression
AnchorName	Selecting by NKG7

...

We have now selected only cells that express >= 1 normalized count for NKG7 gene and CD3 protein, but also have <= 2 normalized count for CD4 protein (Figure 49).

Numbered figure captions

SubtitleText	Filtering using multiple genes and proteins
AnchorName	Filtering using multiple genes and proteins

...

We have now selected the CD4 positive, CD3 positive, NKG7 negative helper T cells (Figure 50).

Numbered figure captions

SubtitleText	Modifying the selection criteria lets us select helper T cells
AnchorName	Selecting helper T cells

...

The zoom level will also be reset (Figure 52).

Numbered figure captions

SubtitleText	Resetting filters also resets the zoom level
AnchorName	Reset zoom to show UMAP

...

There are several clusters that show high levels of CD19 protein expression (Figure 53). We can filter to these cells to examine them more closely.

Numbered figure captions

SubtitleText	Viewing CD19 protein expression on the UMAP plot
AnchorName	CD19 expressing cells

...

Choose Graph-based from the Color by drop-down menu (Figure 55)

Numbered figure captions

SubtitleText	Viewing B lymphocyte clusters
AnchorName	Viewing B lymphocyte clusters

...

This will color the plot by IGHD and IGHA1 (Figure 57).

Numbered figure captions

SubtitleText	Coloring by two genes from the Group biomarkers table
AnchorName	Coloring by two biomarkers

...

This produces a Filtered groups data node (Figure 62).

Numbered figure captions

SubtitleText	Filter groups output
AnchorName	FIlter groups output

...

This will produce two data nodes, one for each data type (Figure 63).

Numbered figure captions

SubtitleText	Split matrix can also re-split the data
AnchorName	Split matrix

...

The report lists each feature tested, giving p-value, false discovery rate adjusted p-value (FDR step up), and fold change values for each comparison (Figure 65).

Numbered figure captions

SubtitleText	GSA report for the protein expression data
AnchorName	GSA report

...

This opens a violin plot showing CD25 expression for cells in each of the classifications (Figure 66).

Numbered figure captions

SubtitleText	Violin plot showing CD25 protein expression
AnchorName	Violin plot

...

This generates a customized heat map to illustrate how the cell types differ in their protein expression (Figure 68).

Numbered figure captions

SubtitleText	Customized heat map illustrating protein expression differences between cell types
AnchorName	MALT heat map

Image Modified

Gene expression

We can use a similar approach to analyze the gene expression data.

...

Each gene is shown as a point on the plot with cut-off lines for fold change and p-value or FDR step up set using the control panel on the left (Figure 70). The number of genes up and down regulated according to the cut-offs is listed at the bottom of the plot. Mousing over a point shows the gene name and other information.

Numbered figure captions

SubtitleText	Volcano plot for Activated vs. Mature B cells
AnchorName	Volcano plot

...

The number at the top of the filter will update to show the number of included genes (Figure 71).

Numbered figure captions

SubtitleText	Filtering GSA results to significant genes
AnchorName	Filtered GSA results

...

The pathway enrichment results list KEGG pathways, giving an enrichment score and p-value for each (Figure 72).

Numbered figure captions

SubtitleText	Pathway enrichment task report
AnchorName	Pathway enrichment task report

Image Modified

To get a better idea about the changes in each enriched pathway, we can view an interactive KEGG pathway map.

...

The KEGG pathway map shows up-regulated genes from the input list in red and down-regulated genes from the input list in green (Figure 73).

Numbered figure captions

SubtitleText	Interactive KEGG pathway map for FoxO signaling pathway
AnchorName	FoxO signaling pathway

Final pipeline

Numbered figure captions

SubtitleText	View of the final pipeline
AnchorName	View of the final pipeline

References

[1] Stoeckius, M., Hafemeister, C., Stephenson, W., Houck-Loomis, B., Chattopadhyay, P. K., Swerdlow, H., ... & Smibert, P. (2017). Simultaneous epitope and transcriptome measurement in single cells. Nature methods, 14(9), 865.

...

[3] Mimitou, E., Cheng, A., Montalbano, A., Hao, S., Stoeckius, M., Legut, M., ... & Satija, R. (2018). Expanding the CITE-seq tool-kit: Detection of proteins, transcriptomes, clonotypes and CRISPR perturbations with multiplexing, in a single assay. bioRxiv, 466466.

Additional assistance

...

Partek Flow Documentation

Page tree

Versions Compared

Old Version 29

New Version 30

Key

Collapsing tasks to simplify the pipeline

Gene expression

Final pipeline

References