Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Single cell counts data node contains two different types of data, mRNA measurements expression and protein measurementsexpression. So that we can process these two different types of data separately, we will split the data by data type. 

  • Click the Single cell counts data node
  • Click the Pre-analysis tools section of in the toolbox
  • Click Split matrix

...

An important step in analyzing single cell RNA-Seq data is to filter out low-quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed. In a CITE-Seq experiment, protein aggregation in the antibody staining reagents can cause a cell to have a very high number of counts; these are low-quality cells are that can be excluded. Additionally, if all cells in a data set are expected to show a baseline level of expression for one of the antibodies used, it may be appropriate to filter out cells with very low counts. You can do this in Partek Flow using the Single cell QA/QC task. 

...

  • Click the Antibody Capture data node
  • Click the QA/QC section in  in the toolbox
  • Click Single Cell QA/QC
  • Choose the assembly and annotation used for the gene expression data (Figure 3) from the drop-down menus
  • Click Finish

...

  • Set the Counts filter to Keep cells between 1500 and 15000 
  • Set the Detected genes filter to Keep cells between 400 and 4000
  • Set the Mitochondrial counts filter to Keep cells between 0% and 20% (Figure 8)

 


Numbered figure captions
SubtitleTextFiltering low-quality cells by gene expression data
AnchorNameFiltering cells by mRNA data

...

We will start with the protein data. We will normalize this data using Centered log-ratio (CLR). CLR was used to normalize antibody capture protein counts data in the paper that introudced introduced CITE-Seq (Stoeckius et al. 2017) and in subsequent publications on similar assays (Stoeckiius Stoeckius et al. 2018, Mimitou et al. 2018). CLR normalization includes the following steps: Add 1, Divide by Geometric mean, Add 1, log base e.

  • Click the Filtered single cell counts data node produced by filtering the Antibody Capture data node
  • Click the Normalization and scaling section in the toolbox
  • Click Normalization
  • Click the green plus next to CLR or drag CLR to the right-hand panel
  • Click Finish to run (Figure 10)

...

  • Click the Filtered single cell counts data node produced by filtering the Gene Expression data node
  • Click the Normalization and scaling section in the toolbox
  • Click Normalization
  • Click the Image Modified button 
  • Change the log base from 2 to e
  • Click Finish to run (Figure 11)

...

  • Click the Normalized counts data node on the Antibody Capture branch of the pipeline
  • Click the Single cell counts data nodeClick the Pre-analysis tools section of  of the toolbox
  • Click Merge matrices
  • Click Select data node to launch the data node selector

...

Tasks that can for the beginning and end of the collapsed section of the pipeline are highlighted in purple (Figure 16). We have chosen the Split matrix task as the start and we can choose Merge matrices as the end of the collapsed section. 

 

 

Numbered figure captions
SubtitleTextTasks that can be the start or end of a collapsed task are shown in purple
AnchorNameViewing options for collapsing

...

  • Name the Collapsed task Data processing
  • Click Save (Figure 17)

 

Numbered figure captions
SubtitleTextNaming the collapsed task
AnchorNameNaming the collapsed task

The new collapsed task, Data processing, appears as a single rectangle on the task graph (Figure 18). 

...

  • Double-click the Data processing title bar to re-collapse (Figure 18)collapse 

Classify cells using Scatter plot

An alternative method to clustering and UMAP/t-SNE for classifying cells is using a scatter plot to visualize the Now that we have our mRNA and protein data filtered and normalized, we can proceed to identify our cell types. The simplest way to do this is classifying cell types based on their expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than with gene expression data alone as the protein expression data has a better dynamic range and is less sparse. Additionally, many cell types have expected cell surface marker profiles established using other technologies such as flow cytometry or CyTOF. To use this strategy, we can use a basic scatter plot. 

  • Click the Merged counts data node
  • Click Exploratory analysis in the toolbox
  • Click Scatter plot
  • Click Finish to run 
  • Double-click the Scatter plot task node to open it
  • Click 2D to switch to a 2D plot style (Figure 4419)

 

Numbered figure captions
SubtitleTextViewing the 2D scatter plot
AnchorName2D scatter plot

Similar to the t-SNE or UMAP scatter plots, each Each point on the plot is a single cell. The axes are set to features (gene or protein) in the data set by default, but can be set to any attribute or feature. On this plot, we can see that CD3_TotalSeqB is on the x-axis and CD4_TotalSeqB is on the y-axis. We can use our selection and filtering tools to perform a basic classification of CD4 and CD8 T cells. 

  • Click the Features tab in the Selection / Filtering section of the control panel
  • Type CD3 in the ID search bar of the Features tab
  • Click CD3_TotalSeqB in the drop-down (Figure 4520)

Numbered figure captions
SubtitleTextFiltering by values for a feature
AnchorNameFiltering by a feature

...

This will select any cell with <= 2 normalized count for CD3 protein. Selected cells are shown in bold on the plot and, because we have CD3_TotalSeqB on one of our axes, the cutoff cut-off point chosen can be easily evaluated (Figure 4621). 

 

Numbered figure captions
SubtitleTextCD3+ cells are selected and shown in bold on the plot
AnchorNameCD3+ cells are selected

...

The x-axis now shows CD8a protein expression (Figure 4722).

 

Numbered figure captions
SubtitleTextSwitching axes on the scatter plot
AnchorNameSwitching axes

...

This will select the cells in the upper left-hand section of the plot (Figure 4823). 

 

Numbered figure captions
SubtitleTextSelecting CD3+ CD4+ CD8- cells
AnchorNameSelecting cells on the scatter plot

...

  • Change CD4_TotalSeqB filter to < 1.5
  • Change CD8a_TotalSeqB filter to >= 2

This selects the cells in the lower right-hand section of the plot (Figure 4924). 

 

Numbered figure captions
SubtitleTextSelecting CD3+ CD4- CD8+ cells
AnchorNameSelecting CD8 T cells

...

  • Click Clear selection 
  • Choose Classifications from the Color by drop-down menu (Figure 5025).

Numbered figure captions
SubtitleTextClassified CD4 and CD8 T cells
AnchorNameClassified cells

...

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this later in a subsequent section of the tutorial. 

Dimensional reduction and clustering with protein expression data

...

If there are fewer than 50 proteins in the data set, all possible PCs (number of proteins - 1) will be used by default and, because using all the PCs will capture all of the variance in the data set, this is equivalent to running clustering on the original data. If you data set has more than 50 proteins and you want to run clustering on full data instead of a subset of PCs, simply set the number of PCs to All in the advanced settings. We will discuss how to pick an optimal number of PCs for data with larger numbers of features, like gene expression data, later in a subsequent section of the tutorial.

Once Graph-based clustering has finished running and produced a Clustering result data node, we can visualize the results using UMAP or t-SNE. Both are dimensional reduction techniques that place cells with similar expression close together. An advantage of UMAP over t-SNE is that is preserves more of the global structure of the data. This means that with UMAP, more similar clusters are closer together while dissimilar clusters are further apart. With t-SNE, the relative positions of clusters to each other are often uninformative.  

...

Each point on the plot is a cell and the cells are colored by their cluster assignments (Figure 3926).

 

Numbered figure captions
SubtitleTextUMAP from protein expression data
AnchorNameUMAP on protein expression

...

  • Choose Expression from the Color by drop-down menu
  • Type CD4 in the search box and choose CD4_TotalSeqB from the drop-down (Figure 2327)

Numbered figure captions
SubtitleTextColoring by expression
AnchorNameColoring by a feature

Cells that express high levels of CD4 are colored blue on the plot (Figure 28).

 

Numbered figure captions
SubtitleTextColoring by CD4 protein expression
AnchorNameColoring by protein expression

...

  • Click  to activate the lasso tool
  • Draw a lasso around the large blue group of cells at the bottom right of the plot to select them (Figure 29)

Numbered figure captions
SubtitleTextSelecting the CD4 cluster
AnchorNameSelecting CD4 cells

...

  • Choose Graph-based from the Color by drop-down menu (Figure 30)

 


Numbered figure captions
SubtitleTextProtein-based clustering results
AnchorNameClustering results from protein data

Again, the colors here indicate the cluster assignment for each cell. Because we ran clustering using only the protein expression data, the cluster assignments are based on each cells protein expression data. To help identify which cell types the clusters correspond to, we generate a group biomarkers table with every clustering result. Biomarkers are genes or proteins that are expressed highly in a clusters when compared with the other clusters. Please note that while While the clustering was calculated using only the protein expression data, the biomarkers are drawn from both gene and protein expression data. 

...

PD-1 expression is highest in cluster 8 with uniformly strong high expression throughout the cluster (Figure 31).

 

Numbered figure captions
SubtitleTextPD-1 expression in helper T cells
AnchorNamePD-1 expression

...

It is interesting to note that this pattern of PD-1 expression is not easily discernible at the PD-1 PDCD1 gene expression level (PDCD1) (Figure 32).

 

Numbered figure captions
SubtitleTextPDCD1 (PD-1) gene expression does not form a clear pattern
AnchorNamePD-1 gene expression

...

The Tfh cell marker, CXCL13, is highly and specifically expressed in cluster 8 (Figure 33), so we will classify the cells from cluster 8 as Tfh cells. 

...

  • Choose Graph-based from the Color by drop-down menu
  • Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel (Figure 34)

Numbered figure captions
SubtitleTextChoosing to select by cluster
AnchorNameSelecting a cluster

  • Click the check box for to select cluster 8 (Figure 35)

Numbered figure captions
SubtitleTextSelecting a cluster
AnchorNameSelecting a cluster

...

  •  Choose Classifications from the Color by drop-down menu (Figure 36)

 

Numbered figure captions
SubtitleTextColoring by classification
AnchorNameColoring by classification

To apply the classification so that it would be available in downstream tasks like differential analysis, we would click Apply classifications. Classifications that are not applied are not available in downstream analysis tasks, but are saved in a draft state on the task report where they were created. Here, we will not save the classification, but we will see how to do this later in a subsequent section of the tutorial. 

Clustering and dimensional reduction with gene expression data

...

Choosing the number of PCs

In As we noted before, in this data set, we have two data types. We can choose to run analysis tasks on one or both of the data types. Here, we will run PCA on only the mRNA data to find the optimal number of PCs for the mRNA data. 

...

  • Click Apply 
  • Click Finish to run (Figure 1537)

Numbered figure captions
SubtitleTextConfiguring PCA to run on the Gene Expression data
AnchorNameConfiguring PCA

...

The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs,   the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering and t-SNEUMAP

  • Mouse over the Scree plot to identify the point where additional PCs offer little additional information (Figure 1638)

Numbered figure captions
SubtitleTextIdentifying an optimal number of PCs
AnchorNameScree plot for Gene Expression data

...

  • Click the Merged counts data node
  • Click Exploratory analysis in the toolbox
  • Click Graph-based clustering 
  • Click Gene Expression for Include features where "Feature type" is
  • Click Configure to access the advanced settings
  • Set Number of principal components to 15
  • Click Apply 
  • Click Finish to run (Figure 1739)

Numbered figure captions
SubtitleTextRunning Graph-based clustering on the Gene Expression data
AnchorNameGraph-based clustering configuration

...

  • Click the Clustering result data node 
  • Click Exploratory analysis in the toolbox
  • Click UMAP
  • Click Gene Expression for Include features where "Feature type" is
  • Click Configure to access the advanced settings
  • Set Number of principal components to 15
  • Click Apply 
  • Click Finish to run (Figure 1840)

Numbered figure captions
SubtitleTextRunning UMAP on the Gene Expression data
AnchorNameRunning UMAP

...

The UMAP task report includes a scatter plot with the clustering results coloring the points (Figure 1941).

 

Numbered figure captions
SubtitleTextUMAP calculated on Gene Expression values. Colored by Graph-based clustering results.
AnchorNameUMAP results

  • Click the 2D radio button for Plot style to switch to the 2D UMAP (Figure 2042)

Numbered figure captions
SubtitleTextViewing the 2D UMAP
AnchorName2D UMAP

...

  • Click  to activate the lasso tool
  • Draw a lasso around clusters 3, 4, and 6 (Figure 2143) to select them

Numbered figure captions
SubtitleTextSelecting a group of clusters
AnchorNameSelecting a group of clusters

  • Click  to filter to include only the selected cells
  • Click  to rescale the axes to the included cells (Figure 2244)

Numbered figure captions
SubtitleTextZooming to a group of clusters in UMAP
AnchorNameViewing one sub-clustering

...

  • Choose Expression from the Color by drop-down menu
  • Type NKG7 in the search box and choose NKG7 from the drop-down (Figure 2345)

Numbered figure captions
SubtitleTextColoring by NKG7 expression
AnchorNameColoring by a gene

This will color the plot by NKG7 gene expression, a marker for cytotoxic cells. We can color by two T cell protein markers to distinguish cytotoxic T cells from helper T cells. 

  • Click  to color by a second feature (gene or protein)
  • Type CD4 and choose CD4_TotalSeqB from the drop-down (Figure 2446)

Numbered figure captions
SubtitleTextColoring by a second feature
AnchorNameColoring by a second feature

...

This will color the plot by NKG7 gene expression, CD4 protein expression, and CD3 protein expression. Each feature gets a color channel, green, red, or blue. Cells without expression are black and the . The mix of green, red, and blue coloring each cell is determined by the relative expression of the three genes. Cells expressing both CD4 protein (red) and CD3 protein (blue), but not NKG7 (green) are purple, while cells expressing both NKG7 (green) and CD3 protein (blue) are teal (Figure 2547). CD3 is a pan-T cells marker, which helps confirm indicates that this group of clusters is composed of T cells. 

...

By default, any cell that expresses >= 1 normalized count of NKG7 is now selected (Figure 2648).

 

Numbered figure captions
SubtitleTextSelecting by NKG7 expression
AnchorNameSelecting by NKG7

...

We have now selected only cells that express >= 1 normalized count for NKG7 gene and CD3 protein, but also have <= 2 normalized count for CD4 protein (Figure 2749).

 

Numbered figure captions
SubtitleTextFiltering using multiple genes and proteins
AnchorNameFiltering using multiple genes and proteins

...

We have now selected the CD4 positive, CD3 positive, NKG7 negative helper T cells (Figure 2850).

 

Numbered figure captions
SubtitleTextModifying the selection criteria lets us select helper T cells
AnchorNameSelecting helper T cells

...

  • Click Clear selection
  • Select Classifications from the Color by drop-down menu (Figure 2951)

Numbered figure captions
SubtitleTextViewing cytotoxic and helper T cell classifications
AnchorNameClassified cells

...

The zoom level will also be reset (Figure 3052).

 

Numbered figure captions
SubtitleTextResetting filters also resets the zoom level
AnchorNameReset zoom to show UMAP

...

 There are several clusters that show high levels of CD19 protein expression (Figure 3153). We can filter to these cells to examine them more closely.

...

  • Click  to activate the lasso tool
  • Draw a lasso around the CD19 protein-expressing clusters to select them
  • Click  to filter to include only the selected cells
  • Click  to rescale the axes to the included cells (Figure 3254)

Numbered figure captions
SubtitleTextFiltering to CD19 expressing clusters
AnchorNameFiltering to B lymphocytes

...

  • Choose Graph-based from the Color by drop-down menu menu (Figure 55)

 

Numbered figure captions
SubtitleTextViewing B lymphocyte clusters
AnchorNameViewing B lymphocyte clusters

...

  • Choose Graph-based from the Select by drop-down in the Attributes tab of the Selection / Filtering section of the control panel 
  • Click the check box for to select cluster 7
  • Click Classify selection (Figure 56)

Numbered figure captions
SubtitleTextSelected cluster 7, a group of potential doublets
AnchorNameSelected cluster 7

...

This will color the plot by IGHD and IGHA1 (Figure 3557).

 

Numbered figure captions
SubtitleTextColoring by two genes from the Group biomarkers table
AnchorNameColoring by two biomarkers

...

  • Select the left-hand cluster with IGHA1 expression (Figure 3658)

Numbered figure captions
SubtitleTextSelecting the IGHA1+ cells
AnchorNameSelecting the plasma cells

...

  • Draw a lasso around the right-hand cluster (Figure 3759)

Numbered figure captions
SubtitleTextSelecting the IGHD+ mature B cells
AnchorNameSelecting the IGHD+ cells

...

  • Select Classifications from the Color by drop-down menu
  • Click Clear filters to view all cells (Figure 3860)

Numbered figure captions
SubtitleTextViewing classifications
AnchorNameViewing all cells

...

  • Click the Classified groups data node
  • Click Filtering 
  • Click Filter groups
  • Set to exclude Classifications is Doublets using the drop-down menus
  • Click AND
  • Set the second filter to exclude Classifications is N/A using the drop-down menus 
  • Click Finish to apply the filter (Figure 61)

Numbered figure captions
SubtitleTextFiltering groups to exclude cell types
AnchorNameFiltering groups

This produces a Filtered groups data node (Figure 62).

 

Numbered figure captions
SubtitleTextFilter groups output
AnchorNameFIlter groups output

...

This will produce two data nodes, one for each data type (Figure 63).

 

Numbered figure captions
SubtitleTextSplit matrix can also re-split the data
AnchorNameSplit matrix

...

  • Click Finish to run the statistical test (Figure 64)

Numbered figure captions
SubtitleTextSetting up a comparison in the GSA task
AnchorNameAdding comparisons

...

The report lists each feature tested, giving p-value, false discovery rate adjusted p-value (FDR step up), and fold change values for each comparison (Figure 65).

 

Numbered figure captions
SubtitleTextGSA report for the protein expression data
AnchorNameGSA report

...

This opens a violin plot showing CD25 expression for cells in each of the classifications (Figure 66).

 

Numbered figure captions
SubtitleTextViolin plot showing CD25 protein expression
AnchorNameViolin plot

...

  • Click the Feature list data node
  • Click Exploratory analysis in the toolbox
  • Click Hierarchical clustering 
  • Click Finish to run with default settings
  • Double-click the Hierarchical clustering task node to open the heat map (Figure 67)

Numbered figure captions
SubtitleTextHeat map prior to customization
AnchorNameHeat map prior to configuration

...

This generates a customized heat map to illustrate how the cell types differ in their protein expression (Figure 68).

 

Numbered figure captions
SubtitleTextCustomized heat map illustrating protein expression differences between cell types
AnchorNameMALT heat map

...

  • Double-click the GSA task node to open the task report (Figure 69)

Numbered figure captions
SubtitleTextResults of differential gene expression analysis
AnchorNameGene expression analysis results

...

Each gene is shown as a point on the plot with cut-off lines for fold change and p-value or FDR step up set using the control panel on the left (Figure 70). The number of genes up and down regulated according to the cut-offs is listed at the bottom of the plot. Mousing over a point shows the gene name and other information. 

...

The number at the top of the filter will update to show the number of included genes (Figure 71).

 

Numbered figure captions
SubtitleTextFiltering GSA results to significant genes
AnchorNameFiltered GSA results

 

...

The pathway enrichment results list KEGG pathways, giving an enrichment score and p-value for each (Figure 72).

 

Numbered figure captions
SubtitleTextPathway enrichment task report
AnchorNamePathway enrichment task report

To get a better idea about the changes in each enriched pathway, we can view an interactive KEGG pathway map.

...

The KEGG pathway map shows up-regulated genes from the input list in red and down-regulated genes from the input list in green (Figure 73). 

 

Numbered figure captions
SubtitleTextInteractive KEGG pathway map for FoxO signaling pathway
AnchorNameFoxO signaling pathway

...