Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Numbered figure captions
SubtitleTextFiltering low-quality cells by gene expression data
AnchorNameFiltering cells by mRNA data

Image Modified

  • Click Apply filter to run the Filter cells task

...

We will start with the protein data. We will normalize this data using Centered log-ratio (CLR). CLR was used to normalize antibody capture protein counts data in the paper that introudced CITE-Seq (Stoeckius et al. 2017) and in subsequent publications on similar assays (Stoeckiius et al. 2018, Mimitou et al. 2018). CLR normalization includes the following steps: Add 1, Divide by Geometric mean, Add 1, log base e.

  • Click the Filtered single cell counts data node produced by filtering the Antibody Capture data node
  • Click the Normalization and scaling section in the toolbox
  • Click Normalization
  • Click the green plus next to CLR (Figure 10) or  or drag CLR to the right-hand panel
  • Click Finish to run (Figure 10)

Numbered figure captions
SubtitleTextPerforming CLR normalization
AnchorNamePerforming CLR normalization

Image Added

Normalization produces a Normalized counts data node on the Antibody Capture branch of the pipeline. 

Next, we can normalize the mRNA data. We will use the recommended normalization method in Partek Flow, which accounts for differences in library size, or the total number of UMI counts, per cell and log transforms the data. To match the CLR normalization used on the Antibody Capture data, we will use a log e transformation instead of the default log 2.

  • Click the Filtered single cell counts data node produced by filtering the Gene Expression data node
  • Click the Normalization and scaling section in the toolbox
  • Click Normalization
  • Click the Image Added button 
  • Change the log base from 2 to e
  • Click Finish to run (Figure 11)

Numbered figure captions
SubtitleTextChoosing CLR normalization
AnchorNameAdding CLR normalization

Image Removed

Centered log-ratio (CLR) normalization is the standard method for CITE-Seq data (. 

 

 

Image Added

Normalization produces a Normalized counts data node on the Gene Expression branch of the pipeline (Figure 12). 

 

Numbered figure captions
SubtitleTextBoth Antibody Capture and Gene Expression data has been normalizied
AnchorNameResults of normalization

 Image Added

Merge Protein and mRNA data

For quality filtering and normalization, we needed to have the two data types separate as the processing steps were distinct, but for downstream analysis we want to be able to analyze protein and mRNA data together. To bring the two data types back together, we will merge the two normalized counts data nodes.

  • Click the Normalized counts data node on the Antibody Capture branch of the pipeline
  • Click the Single cell counts data node
  • Click the Pre-analysis tools section of the toolbox
  • Click Merge matrices
  • Click Select data node to launch the data node selector

Data nodes that can be merged with the Antibody Capture branch Normalized counts data node are shown in color (Figure 13).

 

Numbered figure captions
SubtitleTextChoosing a data node to merge
AnchorNameData node selector

Image Added

  • Click the Normalized counts data node on the Gene Expression branch of the pipeline

A black outline will appear around the chosen data node. 

  • Click Select
  • Click Finish to run the task

The output is a Merged counts data node (Figure 14). This data node will include the normalized counts of our protein and mRNA data. The intersection of cells from the two input data nodes is retained so only cells that passed the quality filter for both protein and mRNA data will be included in the Merged counts data node. 

 

Numbered figure captions
SubtitleTextMerging data types prior to downstream analysis
AnchorNameMerging data types

Image Added

Collapsing tasks to simplify the pipeline

To simplify the appearance of the pipeline, we can group task nodes into a single collapsed task. Here, we will collapse the filtering and normalization steps.

  • Right-click the Split matrix task node 
  • Choose Collapse tasks from the pop-up dialog (Figure 15)

Numbered figure captions
SubtitleTextChoosing the first task node to generate a collapsed task
AnchorNameChoosing Collapse tasks

Image Added

Tasks that can for the beginning and end of the collapsed section of the pipeline are highlighted in purple (Figure 16). We have chosen the Split matrix task as the start and we can choose Merge matrices as the end of the collapsed section. 

 

 

Numbered figure captions
SubtitleTextTasks that can be the start or end of a collapsed task are shown in purple
AnchorNameViewing options for collapsing

Image Added

  • Click Merge matrices to choose it as the end of the collapsed section

The section of the pipeline that will form the collapsed task is highlighted in green.

  • Name the Collapsed task Data processing
  • Click Save (Figure 17)

 

Numbered figure captions
SubtitleTextNaming the collapsed task
AnchorNameNaming the collapsed task

Image Added

The new collapsed task, Data processing, appears as a single rectangle on the task graph (Figure 18). 

 

Numbered figure captions
SubtitleTextCollapsed tasks are represented by a single task node
AnchorNameCollapsed task

Image Added

To view the tasks in Data processing, we can expand the collapsed task.

  • Double-click Data processing to expand it

When expanded, the collapsed task is shown as a shaded section of the pipeline with a title bar (Figure 19).

 

Numbered figure captions
SubtitleTextExpanding a collapsed task to show its components
AnchorNameExpanding a collapsed task

Image Added

To re-collapse the task, you can double click the title bar or click the Image Added icon in the title bar. To remove the collapsed task, you can click the Image Added. Please note that this will not remove tasks, just the grouping.

  • Double-click the Data processing title bar to re-collapse

 

 

Choosing the number of PCs

In this data set, we have two data types. We can choose to run analysis tasks on one or both of the data types. Here, we will run PCA on only the mRNA data to find the optimal number of PCs for the mRNA data. 

  • Click the Merged counts node 
  • Click Exploratory analysis in the task menu
  • Click PCA

Because we have multiple data types, we can choose which we want to use for the PCA calculation. 

  • Click Gene Expression for Include features where "Feature type" is
  • Click Configure to access the advanced settings
  • Click Generate PC quality measures 

This will generate a Scree plot, which is useful for determining how many PCs to use in downstream analysis tasks. 

  • Click Apply 
  • Click Finish to run (Figure 15)

Numbered figure captions
SubtitleTextConfiguring PCA to run on the Gene Expression data
AnchorNameConfiguring PCA

Image Added

A PCA task node will be produced. 

  • Double-click the PCA task node to open the PCA task report

 

The PCA task report includes the PCA plot, the Scree plot, the component loadings table, and the PC projections table. To switch between these elements, use the buttons in the upper right-hand corner of the task report Image Added. Each cell is shown as a dot on the PCA scatter plot. 

  • Click Image Added to open the Scree plot

The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs,  the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering and t-SNE. 

  • Mouse over the Scree plot to identify the point where additional PCs offer little additional information (Figure 16)

Numbered figure captions
SubtitleTextIdentifying an optimal number of PCs
AnchorNameScree plot for Gene Expression data

Image Added

In this data set, a reasonable cut-off could be set anywhere between around 10 and 30 PCs. We will use 15 in downstream steps. 

Graph-based clustering