Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contentschildren


This tutorial presents an outline of the basic series of steps for analyzing a 10x Genomics Gene Expression with Feature Barcoding (antibody) data set in Partek Flow starting with the output of Cell Ranger.  

...

The data set for this tutorial is a demonstration data set from 10x Genomics. The sample includes cells from a dissociated Extranodal Marginal Zone B-Cell Tumor (MALT: Mucosa-Associated Lymphoid Tissue) stained with BioLegend TotalSeq-B antibodies. We are starting with the Feature / cell matrix HDF5 (filtered) produced by Cell Ranger.  

Importing feature barcoding data

...

Numbered figure captions
SubtitleTextConfiguring import of the HDF5 file produced by Cell Ranger 3
AnchorNameConfiguring import

Image Removed

A Single cell counts data node will be created after the file has been imported.

Split matrix

The Single cell counts data node contains two different types of data, mRNA expression and protein expression. So that we can process these two different types of data separately, we will split the data by data type. 

  • Click the Single cell counts data node
  • Click Pre-analysis tools in the toolbox
  • Click Split matrix

A rectangle, or task node, will be created for Split matrix along with two output circles, or data nodes, one for each data type (Figure 2). The labels for these data types are determined by features.csv file used when processing the data with Cell Ranger. Here, our data is labeled Gene Expression, for the mRNA data, and Antibody Capture, for the protein data. 

Numbered figure captions
SubtitleTextSplit matrix produces two data nodes, one for each data type
AnchorNameSplit matrix output

Image Removed

Filter low-quality cells

An important step in analyzing single cell RNA-Seq data is to filter out low-quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed. In a CITE-Seq experiment, protein aggregation in the antibody staining reagents can cause a cell to have a very high number of counts; these are low-quality cells that can be excluded. Additionally, if all cells in a data set are expected to show a baseline level of expression for one of the antibodies used, it may be appropriate to filter out cells with very low counts. You can do this in Partek Flow using the Single cell QA/QC task. 

We will start with the protein data.

...

Numbered figure captions
SubtitleTextConfiguring Single-cell QA/QC
AnchorNameConfiguring Single-cell QA/QC

Image Removed

This produces a Single-cell QA/QC task node (Figure 4). 

Numbered figure captions
SubtitleTextSingle cell QA/QC produces a task node
AnchorNameOutput of Single cell QA/QC

Image Removed

  • Double-click the Single cell QA/QC task node to open the task report

The task report lists the number of counts per cell and the number of detected features per cell in two violin plots. For more information, please see our documentation for the Single cell QA/QC task. For this analysis, we will set a maximum counts threshold to exclude potential protein aggregates and, because we expect every cell to be bound by several antibodies, we will also set a minimum counts threshold. 

...

Numbered figure captions
SubtitleTextSingle cell QA/QC report - Antibody capture
AnchorNameSingle cell QA/QC report - Antibody capture

Image Removed

  • Click Apply filter to run the Filter cells task

The output is a Filtered single cell counts data node (Figure 6).

Numbered figure captions
SubtitleTextFiltered cells output
AnchorNameFiltered cells by protein count

Image Removed

Next, we can repeat this process for the Gene Expression data node. 

  • Click the Gene Expression data node
  • Click the QA/QC section in the toolbox
  • Click Single Cell QA/QC
  • Choose the assembly and annotation used for the gene expression data (Figure 3) from the drop-down menus
  • Click Finish

This produces a Single-cell QA/QC task node (Figure 7). 

Numbered figure captions
SubtitleTextSingle cell QA/QC produces a task node
AnchorNameOutput of Single cell QA/QC (2)

Image Removed

  • Double-click the Single cell QA/QC task node to open the task report

The task report lists the number of counts per cell, the number of detected features per cell, and the percentage of mitochondrial reads per cell in three violin plots. For this analysis, we will set a maximum counts threshold maximum and minimum thresholds for total counts and detected genes to exclude potential doublets and a maximum mitochondrial reads percentage filter to exclude potential dead or dying cells. 

...

Numbered figure captions
SubtitleTextFiltering low-quality cells by gene expression data
AnchorNameFiltering cells by mRNA data

Image Removed

  • Click Apply filter to run the Filter cells task

The output is a Filtered single cell counts data node (Figure 9).

Numbered figure captions
SubtitleTextThere are now two Filtered single cell counts data nodes
AnchorNameFiltering out low-quality cells

Image Removed

Normalization

After excluding low-quality cells, we can normalize the data. 

We will start with the protein data. We will normalize this data using Centered log-ratio (CLR). CLR was used to normalize antibody capture protein counts data in the paper that introduced CITE-Seq [1] and in subsequent publications on similar assays [2. 3]. CLR normalization includes the following steps: Add 1, Divide by Geometric mean, Add 1, log base e.

...

Numbered figure captions
SubtitleTextPerforming CLR normalization
AnchorNamePerforming CLR normalization

Image Removed

Normalization produces a Normalized counts data node on the Antibody Capture branch of the pipeline. 

Next, we can normalize the mRNA data. We will use the recommended normalization method in Partek Flow, which accounts for differences in library size, or the total number of UMI counts, per cell and log transforms the data. To match the CLR normalization used on the Antibody Capture data, we will use a log e transformation instead of the default log 2.

...

Numbered figure captions
SubtitleTextChoosing CLR normalization
AnchorNameAdding CLR normalization

Image Removed

Normalization produces a Normalized counts data node on the Gene Expression branch of the pipeline (Figure 12). 

Numbered figure captions
SubtitleTextBoth Antibody Capture and Gene Expression data has been normalizied
AnchorNameResults of normalization

 Image Removed

Merge Protein and mRNA data

For quality filtering and normalization, we needed to have the two data types separate as the processing steps were distinct, but for downstream analysis we want to be able to analyze protein and mRNA data together. To bring the two data types back together, we will merge the two normalized counts data nodes.

  • Click the Normalized counts data node on the Antibody Capture branch of the pipeline
  • Click Pre-analysis tools of the toolbox
  • Click Merge matrices
  • Click Select data node to launch the data node selector

Data nodes that can be merged with the Antibody Capture branch Normalized counts data node are shown in color (Figure 13).

Numbered figure captions
SubtitleTextChoosing a data node to merge
AnchorNameData node selector

Image Removed

  • Click the Normalized counts data node on the Gene Expression branch of the pipeline

A black outline will appear around the chosen data node. 

  • Click Select
  • Click Finish to run the task

The output is a Merged counts data node (Figure 14). This data node will include the normalized counts of our protein and mRNA data. The intersection of cells from the two input data nodes is retained so only cells that passed the quality filter for both protein and mRNA data will be included in the Merged counts data node. 

Numbered figure captions
SubtitleTextMerging data types prior to downstream analysis
AnchorNameMerging data types

Image Removed

Collapsing tasks to simplify the pipeline

To simplify the appearance of the pipeline, we can group task nodes into a single collapsed task. Here, we will collapse the filtering and normalization steps.

...

Numbered figure captions
SubtitleTextChoosing the first task node to generate a collapsed task
AnchorNameChoosing Collapse tasks

Image Removed

Tasks that can for the beginning and end of the collapsed section of the pipeline are highlighted in purple (Figure 16). We have chosen the Split matrix task as the start and we can choose Merge matrices as the end of the collapsed section. 

Numbered figure captions
SubtitleTextTasks that can be the start or end of a collapsed task are shown in purple
AnchorNameViewing options for collapsing

Image Removed

  • Click Merge matrices to choose it as the end of the collapsed section

The section of the pipeline that will form the collapsed task is highlighted in green.

...

Numbered figure captions
SubtitleTextNaming the collapsed task
AnchorNameNaming the collapsed task

Image Removed

The new collapsed task, Data processing, appears as a single rectangle on the task graph (Figure 18). 

Numbered figure captions
SubtitleTextCollapsed tasks are represented by a single task node
AnchorNameCollapsed task

Image Removed

To view the tasks in Data processing, we can expand the collapsed task.

  • Double-click Data processing to expand it

When expanded, the collapsed task is shown as a shaded section of the pipeline with a title bar (Figure 19).

Numbered figure captions
SubtitleTextExpanding a collapsed task to show its components
AnchorNameExpanding a collapsed task

Image Removed

To re-collapse the task, you can double click the title bar or click the Image Removed icon in the title bar. To remove the collapsed task, you can click the Image Removed. Please note that this will not remove tasks, just the grouping.

  • Double-click the Data processing title bar to re-collapse 

Classify cells using Scatter plot

Now that we have our mRNA and protein data filtered and normalized, we can proceed to identify our cell types. The simplest way to do this is classifying cell types based on their expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than with gene expression data alone as the protein expression data has a better dynamic range and is less sparse. Additionally, many cell types have expected cell surface marker profiles established using other technologies such as flow cytometry or CyTOF. To use this strategy, we can use a basic scatter plot. 

...

Numbered figure captions
SubtitleTextViewing the 2D scatter plot
AnchorName2D scatter plot

Image Removed

Each point on the plot is a single cell. The axes are set to features (gene or protein) in the data set by default, but can be set to any attribute or feature. On this plot, we can see that CD3_TotalSeqB is on the x-axis and CD4_TotalSeqB is on the y-axis. We can use our selection and filtering tools to perform a basic classification of CD4 and CD8 T cells. 

...

Numbered figure captions
SubtitleTextFiltering by values for a feature
AnchorNameFiltering by a feature

Image Removed

  • Click Image Removed to add a filter for CD3 protein expression
  • Set the CD3_TotalSeqB filter to => 2 

This will select any cell with => 2 normalized count for CD3 protein. Selected cells are shown in bold on the plot and, because we have CD3_TotalSeqB on one of our axes, the cut-off point chosen can be easily evaluated (Figure 22). 

Numbered figure captions
SubtitleTextCD3+ cells are selected and shown in bold on the plot
AnchorNameCD3+ cells are selected

Image Removed

The selected CD3+ cells are our T cells. We can filter to these cells prior to performing our classification of CD4 and CD8 T cells sub-types.

  • Click Image Removed to filter to include only the selected cells

Next, we can switch the x-axis to show CD8 protein expression so that we can perform our classification.

  • Click the axis text box in the Plot setup section of the control panel
  • Click CD8a_TotalSeqB from the drop-down list (or type it and then select it if it is not visible)
  • Click Image Removed to rescale the axes to the included cells

The x-axis now shows CD8a protein expression (Figure 23).

Numbered figure captions
SubtitleTextSwitching axes on the scatter plot
AnchorNameSwitching axes

Image Removed

We can now use a set of filters to select and classify the CD3+ CD4+ CD8- T cells.

  • Type CD4 in the ID search bar of the Features tab
  • Click CD4_TotalSeqB in the drop-down
  • Click Image Removed to add a filter for CD4 protein expression
  • Set the CD4_TotalSeqB filter to => 2 
  • Type CD8a in the ID search bar of the Features tab
  • Click CD8a_TotalSeqB in the drop-down
  • Click Image Removed to add a filter for CD8a protein expression
  • Set the CD8a_TotalSeqB filter to < 2 

This will select the cells in the upper left-hand section of the plot (Figure 24). 

Numbered figure captions
SubtitleTextSelecting CD3+ CD4+ CD8- cells
AnchorNameSelecting cells on the scatter plot

Image Removed

  • Click Classify selection 
  • Name the group CD4 T cells
  • Click Save

We can now select and classify CD3+ CD4- CD8+ T cells using the filters we have already created.

  • Change CD4_TotalSeqB filter to < 1.5
  • Change CD8a_TotalSeqB filter to >= 2

This selects the cells in the lower right-hand section of the plot (Figure 25). 

Numbered figure captions
SubtitleTextSelecting CD3+ CD4- CD8+ cells
AnchorNameSelecting CD8 T cells

Image Removed

  • Click Classify selection 
  • Name the group CD8 T cells
  • Click Save

To view our classifications, we can clear the selection and color by classification.

...

Numbered figure captions
SubtitleTextClassified CD4 and CD8 T cells
AnchorNameClassified cells

Image Removed

Alternatively, we could have used the lasso tool Image Removed to select the population of interest manually and then classified the selected cells.

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this later in the tutorial. 

Dimensional reduction and clustering with protein expression data

For experiments like CITE-Seq where we have many protein markers, we can use dimensional reduction and clustering to identify groups of similar cells based on their overall expression pattern. 

In the Merged counts data node, we have two data types. We can choose to use one or both of the data types in our analysis. Here, we will run dimensional reduction and clustering on the protein expression data.

  • Click the Merged counts data node
  • Click Exploratory analysis in the toolbox
  • Click Graph-based clustering 
  • Click Antibody Capture for Include features where "Feature type" is
  • Click Finish to run 

If there are fewer than 50 proteins in the data set, all possible PCs (number of proteins - 1) will be used by default and, because using all the PCs will capture all of the variance in the data set, this is equivalent to running clustering on the original data. If you data set has more than 50 proteins and you want to run clustering on full data instead of a subset of PCs, simply set the number of PCs to All in the advanced settings. We will discuss how to pick an optimal number of PCs for data with larger numbers of features, like gene expression data, later in the tutorial.

Once Graph-based clustering has finished running and produced a Clustering result data node, we can visualize the results using UMAP or t-SNE. Both are dimensional reduction techniques that place cells with similar expression close together. An advantage of UMAP over t-SNE is that is preserves more of the global structure of the data. This means that with UMAP, more similar clusters are closer together while dissimilar clusters are further apart. With t-SNE, the relative positions of clusters to each other are often uninformative.  

  • Click the Clustering result data node 
  • Click Exploratory analysis in the toolbox
  • Click UMAP
  • Click Antibody Capture for Include features where "Feature type" is
  • Click Finish to run

We can open the UMAP task report to view the clustering result.

  • Double-click the UMAP task node
  • Click 2D in the plot style section to switch to 2D

Each point on the plot is a cell and the cells are colored by their cluster assignments (Figure 27).

Numbered figure captions
SubtitleTextUMAP from protein expression data
AnchorNameUMAP on protein expression

Image Removed

Because we merged the gene and protein expression data, we can overlay protein and gene expression values on the plot.

...

Numbered figure captions
SubtitleTextColoring by expression
AnchorNameColoring by a feature

Image Removed

Cells that express high levels of CD4 are colored blue on the plot (Figure 29).

Numbered figure captions
SubtitleTextColoring by CD4 protein expression
AnchorNameColoring by protein expression

Image Removed

The cluster of cells expressing high levels of CD4 are likely our CD4 T cells. We can take a closer look at the CD4 T cell cluster to see if any sub-types can be identified using the clustering results and expression information.

...

Numbered figure captions
SubtitleTextSelecting the CD4 cluster
AnchorNameSelecting CD4 cells

Image Removed

  • Click Image Removed to filter to include only the selected cells
  • Click Image Removed to rescale the axes to the included cells 

With that, let's take a look at the clustering results from the protein expression data for these cells. 

...

Numbered figure captions
SubtitleTextProtein-based clustering results
AnchorNameClustering results from protein data

Image Removed

Again, the colors here indicate the cluster assignment for each cell. Because we ran clustering using only the protein expression data, the cluster assignments are based on each cells protein expression data. To help identify which cell types the clusters correspond to, we generate a group biomarkers table with every clustering result. Biomarkers are genes or proteins that are expressed highly in a clusters when compared with the other clusters. While the clustering was calculated using only the protein expression data, the biomarkers are drawn from both gene and protein expression data. 

The far-right cluster, cluster 8, has several interesting biomarkers. The top biomarker, is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Two of the other biomarkers are PD-1 protein, which is expressed in Tfh cells, promotes self-tolerance, and is a target for immunotherapy drugs; and TIGIT protein, another immunotherapy drug target that promotes self-tolerance. 

To assess the specificity of these biomarkers to this cluster, we can overlay the expression on the scatter plot. 

  • Choose Expression from the Color by drop-down menu
  • Type PD-1 in the search box and choose PD-1_TotalSeqB from the drop-down

PD-1 expression is highest in cluster 8 with high expression throughout the cluster (Figure 32).

Numbered figure captions
SubtitleTextPD-1 expression in helper T cells
AnchorNamePD-1 expression

Image Removed

  • Type PDCD1 in the Expression search box and choose PDCD1 from the drop-down

It is interesting to note that this pattern of PD-1 expression is not easily discernible at the PDCD1 gene expression level (Figure 33).

Numbered figure captions
SubtitleTextPDCD1 (PD-1) gene expression does not form a clear pattern
AnchorNamePD-1 gene expression

Image Removed

  • Type CXCL13 in the Expression search box and choose CXCL13 from the drop-down

The Tfh cell marker, CXCL13, is highly and specifically expressed in cluster 8 (Figure 34), so we will classify the cells from cluster 8 as Tfh cells. 

Numbered figure captions
SubtitleTextCXCL13 expression is strong in cluster 8
AnchorNameCXCL13 expression

Image Removed

...

Numbered figure captions
SubtitleTextChoosing to select by cluster
AnchorNameSelecting a cluster

Image Removed

...

Numbered figure captions
SubtitleTextSelecting a cluster
AnchorNameSelecting a cluster

Image Removed

  • Click Classify selection 
  • Name the cells Tfh cells
  • Click Save 

We can classify the remaining cells from this CD4+ group as Helper T cells. 

  • Click Image Removed to invert the selection and select the cells outside of cluster 8
  • Click Classify selection 
  • Name the cells Helper T cells
  • Click Save 

To return to the full data set, we can clear our selection and filter.

  • Click Clear selection
  • Click Clear filters

To visualize our classifications, we can color by Classifications.

...

Numbered figure captions
SubtitleTextColoring by classification
AnchorNameColoring by classification

Image Removed

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this later in the tutorial. 

Clustering and dimensional reduction with gene expression data

Because principal components are used as the input for both graph-based clustering and UMAP when working with gene expression data, it is important to determine an optimal number of PCs to use in downstream analysis. 

Choosing the number of PCs

As we noted before, in this data set, we have two data types. We can choose to run analysis tasks on one or both of the data types. Here, we will run PCA on only the mRNA data to find the optimal number of PCs for the mRNA data. 

  • Click the Merged counts node 
  • Click Exploratory analysis in the task menu
  • Click PCA

Because we have multiple data types, we can choose which we want to use for the PCA calculation. 

  • Click Gene Expression for Include features where "Feature type" is
  • Click Configure to access the advanced settings
  • Click Generate PC quality measures 

This will generate a Scree plot, which is useful for determining how many PCs to use in downstream analysis tasks. 

...

Numbered figure captions
SubtitleTextConfiguring PCA to run on the Gene Expression data
AnchorNameConfiguring PCA

Image Removed

A PCA task node will be produced. 

  • Double-click the PCA task node to open the PCA task report

The PCA task report includes the PCA plot, the Scree plot, the component loadings table, and the PC projections table. To switch between these elements, use the buttons in the upper right-hand corner of the task report Image Removed. Each cell is shown as a dot on the PCA scatter plot. 

  • Click Image Removed to open the Scree plot

The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering and UMAP. 

...

Numbered figure captions
SubtitleTextIdentifying an optimal number of PCs
AnchorNameScree plot for Gene Expression data

Image Removed

In this data set, a reasonable cut-off could be set anywhere between around 10 and 30 PCs. We will use 15 in downstream steps. 

Cluster by Gene Expression data

After determining the optimal number of PCs, we can proceed to clustering. 

...

Numbered figure captions
SubtitleTextRunning Graph-based clustering on the Gene Expression data
AnchorNameGraph-based clustering configuration

Image Removed

Once Graph-based clustering has finished running and produced a Clustering result data node, we can visualize the results using UMAP.

...

Numbered figure captions
SubtitleTextRunning UMAP on the Gene Expression data
AnchorNameRunning UMAP

Image Removed

  • Double-click the UMAP task node to open the task report

The UMAP task report includes a scatter plot with the clustering results coloring the points (Figure 41).

Numbered figure captions
SubtitleTextUMAP calculated on Gene Expression values. Colored by Graph-based clustering results.
AnchorNameUMAP results

Image Removed

...

Numbered figure captions
SubtitleTextViewing the 2D UMAP
AnchorName2D UMAP

Image Removed

Classify from expression and clustering

...

Numbered figure captions
SubtitleTextSelecting a group of clusters
AnchorNameSelecting a group of clusters

Image Removed

...

Numbered figure captions
SubtitleTextZooming to a group of clusters in UMAP
AnchorNameViewing one sub-clustering

Image Removed

Because we merged the gene and protein expression data, we can visualize a mix of genes and proteins on the gene expression UMAP.

...

Numbered figure captions
SubtitleTextColoring by NKG7 expression
AnchorNameColoring by a gene

Image Removed

This will color the plot by NKG7 gene expression, a marker for cytotoxic cells. We can color by two T cell protein markers to distinguish cytotoxic T cells from helper T cells. 

...

Numbered figure captions
SubtitleTextColoring by a second feature
AnchorNameColoring by a second feature

Image Removed

This will color the plot by NKG7 gene expression and CD4 protein expression, a marker for helper T cells. We can add a third feature.

  • Click Image Removed to color by a third feature (gene or protein)
  • Type CD3 and choose CD3_TotalSeqB from the drop-down

This will color the plot by NKG7 gene expression, CD4 protein expression, and CD3 protein expression. Each feature gets a color channel, green, red, or blue. Cells without expression are black. The mix of green, red, and blue coloring each cell is determined by the relative expression of the three genes. Cells expressing both CD4 protein (red) and CD3 protein (blue), but not NKG7 (green) are purple, while cells expressing both NKG7 (green) and CD3 protein (blue) are teal (Figure 48). CD3 is a pan-T cells marker, which indicates that this group of clusters is composed of T cells. 

Numbered figure captions
SubtitleTextUMAP colored by gene and protein expression
AnchorNameUMAP colored by three markers

Image Removed

In addition to coloring by the expression of genes and proteins, we can select cells by their expression levels.

  • Click the Features tab in the Selection / Filtering section of the control panel
  • Type NKG7 in the ID search bar of the Features tab
  • Click NKG7 to select it
  • Click Image Removed to add a filter for NKG7 expression

By default, any cell that expresses >= 1 normalized count of NKG7 is now selected (Figure 49).

Numbered figure captions
SubtitleTextSelecting by NKG7 expression
AnchorNameSelecting by NKG7

Image Removed

  • Type CD3 in the ID search bar of the Features tab
  • Click CD3_TotalSeqB in the drop-down
  • Click Image Removed to add a filter for CD3 protein expression

Now, any cell that expresses >= 1 normalized count for NKG7 gene and CD3 protein is selected. You can also require that a cell not express a gene or protein.

  • Type CD4 in the ID search bar of the Features tab
  • Click CD4_TotalSeqB in the drop-down
  • Click Image Removed to add a filter for CD4 protein expression
  • Set the CD4_TotalSeqB filter to <= 2 

We have now selected only cells that express >= 1 normalized count for NKG7 gene and CD3 protein, but also have <= 2 normalized count for CD4 protein (Figure 50).

Numbered figure captions
SubtitleTextFiltering using multiple genes and proteins
AnchorNameFiltering using multiple genes and proteins

Image Removed

We can classify these cells. Because they express the pan T cell maker, CD3, and the cytotoxic marker, NKG7, but not the helper T cell marker, CD4, we can classify these cells as Cytotoxic T cells. 

  • Click Classify selection
  • Type Cytotoxic T cells for the name
  • Click Save

To classify the helper T cells, we can modify the selection criteria. 

  • Set NKG7 to =< 1
  • Set CD4_TotalSeqB to >= 2

We have now selected the CD4 positive, CD3 positive, NKG7 negative helper T cells (Figure 51).

Numbered figure captions
SubtitleTextModifying the selection criteria lets us select helper T cells
AnchorNameSelecting helper T cells

Image Removed

  • Click Classify selection
  • Type Helper T cells for the name
  • Click Save

We can check the results of our classification.

...

Numbered figure captions
SubtitleTextViewing cytotoxic and helper T cell classifications
AnchorNameClassified cells

Image Removed

To return to the full data set, we can clear the filter.

  • Click Clear filters 

The zoom level will also be reset (Figure 53).

Numbered figure captions
SubtitleTextResetting filters also resets the zoom level
AnchorNameReset zoom to show UMAP

Image Removed

In addition to T-cells, we would expect to see B lymphocytes, at least some of which are malignant, in a MALT tumor sample. We can color the plot by expression of a B cell marker to locate these cells on the UMAP plot. 

  • Choose Expression from the Color by drop-down menu
  • Click Image Removed twice to close the second and third genes
  • Type CD19 in the search box
  • Click CD19_TotalSeqB in the drop-down

 There are several clusters that show high levels of CD19 protein expression (Figure 54). We can filter to these cells to examine them more closely.

Numbered figure captions
SubtitleTextViewing CD19 protein expression on the UMAP plot
AnchorNameCD19 expressing cells

Image Removed

...

Numbered figure captions
SubtitleTextFiltering to CD19 expressing clusters
AnchorNameFiltering to B lymphocytes

Image Removed

We can use information from the graph-based clustering results to help us find sub-groups within the CD19 protein-expressing cells.

  • Choose Graph-based from the Color by drop-down menu (Figure 56)
Numbered figure captions
SubtitleTextViewing B lymphocyte clusters
AnchorNameViewing B lymphocyte clusters

Image Removed

Cluster 7, shown in pink, lists IL7R and CD3D, genes typically expressed by T cells, as two of its top biomarkers. Biomarkers are genes or proteins that are expressed highly in a clusters when compared with the other clusters. Therefore, the cells in cluster 7 are likely doublets as they express both B cell (CD19) and T cell (CD3D) markers. 

...

Numbered figure captions
SubtitleTextSelected cluster 7, a group of potential doublets
AnchorNameSelected cluster 7

Image Removed

  • Name the cells Doublets
  • Click Save 
  • Click Image Removed to filter to exclude the selected cells

The biomarkers for clusters 1 and 2 also show an interesting pattern. Cluster 1 lists IGHD as its top biomarker, while cluster 2 lists IGHA1. Both IGHD (Immunoglobulin Heavy Constant Delta) and IGHA1 (Immunoglobulin Heavy Constant Alpha 1) encode classes of the immunoglobulin heavy chain constant region. IGHD is part of IgD, which is expressed by mature B cells, and IGHA1 is part of IgA1, which is expressed by activated B cells. We can color the plot by both of these genes to visualize their expression.

  • Click IGHD in the Group biomarkers table
  • Hold Ctrl on your keyboard and click IGHA1 in the Group biomarkers table

This will color the plot by IGHD and IGHA1 (Figure 58).

Numbered figure captions
SubtitleTextColoring by two genes from the Group biomarkers table
AnchorNameColoring by two biomarkers

Image Removed

The clusters on the left show expression of IGHA1 while the larger or the two clusters on the right expresses IGHD. We can use the lasso tool to classify these populations.

...

Numbered figure captions
SubtitleTextSelecting the IGHA1+ cells
AnchorNameSelecting the plasma cells

Image Removed

  • Click Classify selection
  • Name them Activated B cells 
  • Click Save
  • Double-click any white-space on the plot to clear the selection

We can now classify the cluster that expresses IGHD as mature B cells. 

...

Numbered figure captions
SubtitleTextSelecting the IGHD+ mature B cells
AnchorNameSelecting the IGHD+ cells

Image Removed

  • Click Classify selection
  • Name them Mature B cells
  • Click Save
  • Double-click any white-space on the plot to clear the selection

We can visualize our classifications.

...

Numbered figure captions
SubtitleTextViewing classifications
AnchorNameViewing all cells

Image Removed

To use these classifications in downstream analysis, we can apply the classifications.

  • Click Apply classifications 
  • Click Apply to confirm

This will produce a Classified groups data node. 

Filter groups

Because we have classified our cells, we can now filter based on those classifications. This can be used to focus on a single cell type for re-clustering and sub-classification or to exclude cells that are not of interest for downstream analysis.

...

Numbered figure captions
SubtitleTextFiltering groups to exclude cell types
AnchorNameFiltering groups

Image Removed

This produces a Filtered counts data node (Figure 63).

Numbered figure captions
SubtitleTextFilter groups output
AnchorNameFIlter groups output

Image Removed

Re-split the matrix

Prior to performing differential analysis, you may want to separate your protein and gene expression data. The split data nodes will both retain cluster and classification information.

  • Click the Classified groups data node
  • Click Pre-analysis tools
  • Click Split matrix

This will produce two data nodes, one for each data type (Figure 64).

Numbered figure captions
SubtitleTextSplit matrix can also re-split the data
AnchorNameSplit matrix

Image Removed

Differential analysis and visualization

Once we have classified our cells, we can use this information to perform comparisons between cell types or between experimental groups for a cell type. In this project, we only have a single sample, so we will compare cell types. 

Protein expression

  • Click the Antibody Capture data node
  • Click Differential analysis 
  • Click GSA

The first step is to choose which attributes we want to consider in the statistical test. 

  • Check Classifications to include it in the statistical test
  • Click Next 

Next, we will set up the comparison we want to make. Here, we will compare the Activated and Mature B cells.

  • Check Activated B cells in the top panel
  • Check Mature B cells in the bottom panel 
  • Click Add comparison

The comparison should appear in the table as Activated B cells vs. Mature B cells.

...

Numbered figure captions
SubtitleTextSetting up a comparison in the GSA task
AnchorNameAdding comparisons

Image Removed

The GSA task produces a Feature list data node.

  • Double-click the GSA task node to open the task report

The report lists each feature tested, giving p-value, false discovery rate adjusted p-value (FDR step up), and fold change values for each comparison (Figure 66).

Numbered figure captions
SubtitleTextGSA report for the protein expression data
AnchorNameGSA report

Image Removed

In addition to the listed information, we can access dot and violin plots for each gene or protein from this table.

  • Click Image Removed in the CD25_TotalSeqB row

This opens a violin plot showing CD25 expression for cells in each of the classifications (Figure 67).

Numbered figure captions
SubtitleTextViolin plot showing CD25 protein expression
AnchorNameViolin plot

Image Removed

Please see the Dot Plot documentation page to learn more about this visualization.

  • Click the project name to return to the Analyses tab

To visualize all of the proteins at the same time, we can make a hierarchical clustering heat map.

...

Numbered figure captions
SubtitleTextHeat map prior to customization
AnchorNameHeat map prior to configuration

Image Removed

The heat map can easily be customized to illustrate our results.

  • Click Image Removed to transpose the heat map
  • Set High to 2.8 to match the low range
  • Set the Sample dendrogram to By sample attribute Classifications
  • Set Attributes to Classifications
  • Click Image Removed and set Rotation to 0
  • Uncheck Samples under Show labels

This generates a customized heat map to illustrate how the cell types differ in their protein expression (Figure 69).

Numbered figure captions
SubtitleTextCustomized heat map illustrating protein expression differences between cell types
AnchorNameMALT heat map

Image Removed

Gene expression

We can use a similar approach to analyze the gene expression data.

  • Click the Gene Expression data node
  • Click Differential analysis 
  • Click GSA
  • Check Classifications to include it in the statistical test
  • Click Next 
  • Check Activated B cells in the top panel
  • Check Mature B cells in the bottom panel 
  • Click Add comparison 
  • Click Finish to run the statistical test

As before, this will generate a GSA task node and a Feature list data node.

...

Numbered figure captions
SubtitleTextResults of differential gene expression analysis
AnchorNameGene expression analysis results

Image Removed

Because 19,745 genes have been analyzed, it is useful to use a volcano plot to get an idea about the overall changes.

  • Click Image Removed to open a volcano plot

Each gene is shown as a point on the plot with cut-off lines for fold change and p-value or FDR step up set using the control panel on the left (Figure 71). The number of genes up and down regulated according to the cut-offs is listed at the bottom of the plot. Mousing over a point shows the gene name and other information. 

Numbered figure captions
SubtitleTextVolcano plot for Activated vs. Mature B cells
AnchorNameVolcano plot

Image Removed

  • Click GSA report to return to the full report

We can filter the full set of genes to include only the significantly different genes using the filter panel on the left.

  • Click FDR step up 
  • Type 0.05 for the cutoff and press Enter on your keyboard or click the check box
  • Click Fold change 
  • Set to From -2 to 2 and press Enter on your keyboard or click the check box

The number at the top of the filter will update to show the number of included genes (Figure 72).

Numbered figure captions
SubtitleTextFiltering GSA results to significant genes
AnchorNameFiltered GSA results

 Image Removed

  • Click Generate filtered node to create a data node including only these significantly different genes

A task, Differential analysis filter, will run and generate a new Feature list data node. We can get a better idea about the biology underlying these gene expression changes using gene set or pathway enrichment. 

  • Click the filtered Feature list data node
  • Click Biological interpretation in the toolbox
  • Click Pathway enrichment 
  • Make sure that Homo sapiens is selected in the Species drop-down menu
  • Click Finish to run
  • Double-click the Pathway enrichment task node to open the task report

The pathway enrichment results list KEGG pathways, giving an enrichment score and p-value for each (Figure 73).

Numbered figure captions
SubtitleTextPathway enrichment task report
AnchorNamePathway enrichment task report

Image Removed

To get a better idea about the changes in each enriched pathway, we can view an interactive KEGG pathway map.

  • Click path:hsa04068 in the FoxO signaling pathway row

The KEGG pathway map shows up-regulated genes from the input list in red and down-regulated genes from the input list in green (Figure 734). 

Numbered figure captions
SubtitleTextInteractive KEGG pathway map for FoxO signaling pathway
AnchorNameFoxO signaling pathway

Image Removed

Final pipeline

Numbered figure captions
SubtitleTextView of the final pipeline
AnchorNameView of the final pipeline

Image Removed

References

[1] Stoeckius, M., Hafemeister, C., Stephenson, W., Houck-Loomis, B., Chattopadhyay, P. K., Swerdlow, H., ... & Smibert, P. (2017). Simultaneous epitope and transcriptome measurement in single cells. Nature methods, 14(9), 865.

[2] Stoeckius, M., Zheng, S., Houck-Loomis, B., Hao, S., Yeung, B. Z., Mauck, W. M., ... & Satija, R. (2018). Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome biology, 19(1), 224.

[3] Mimitou, E., Cheng, A., Montalbano, A., Hao, S., Stoeckius, M., Legut, M., ... & Satija, R. (2018). Expanding the CITE-seq tool-kit: Detection of proteins, transcriptomes, clonotypes and CRISPR perturbations with multiplexing, in a single assay. bioRxiv, 466466.

Prior to beginning, you can either transfer this file to your Partek Flow server, download the file to your local machine, or simply copy the URL to your clipboard.


Action Button
linkImporting Feature Barcoding Data
textImporting Feature Barcoding Data >


Additional assistance



Rate Macro