Page History
Table of Contentschildren |
---|
This tutorial presents an outline of the basic series of steps for analyzing a 10x Genomics Gene Expression with Feature Barcoding (antibody) data set in Partek Flow starting with the output of Cell Ranger.
If you are starting with the raw data (FASTQ files), please begin with our Processing CITE-Seq data tutorial, which will take you from raw data to count matrix files. If you have Cell Hashing data, please see our documentation on Hashtag demultiplexing.
...
The data set for this tutorial is a demonstration data set from 10x Genomics. The sample includes cells from a dissociated Extranodal Marginal Zone B-Cell Tumor (MALT: Mucosa-Associated Lymphoid Tissue) stained with BioLegend TotalSeq-B antibodies. We are starting with the Feature / cell matrix HDF5 (filtered) produced by Cell Ranger.
Importing feature barcoding data
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
A Single cell counts data node will be created after the file has been imported.
Split matrix
The Single cell counts data node contains two different types of data, mRNA measurements and protein measurements. So that we can process these two different types of data separately, we will split the data by data type.
- Click the Single cell counts data node
- Click the Pre-analysis tools section of the toolbox
- Click Split matrix
A rectangle, or task node, will be created for Split matrix along with two output circles, or data nodes, one for each data type (Figure 2). The labels for these data types are determined by features.csv file used when processing the data with Cell Ranger. Here, our data is labeled Gene Expression, for the mRNA data, and Antibody Capture, for the protein data.
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Filter low-quality cells
An important step in analyzing single cell RNA-Seq data is to filter out low-quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed. In a CITE-Seq experiment, protein aggregation in the antibody staining reagents can cause a cell to have a very high number of counts; these are low-quality cells are can be excluded. Additionally, if all cells in a data set are expected to show a baseline level of expression for one of the antibodies used, it may be appropriate to filter out cells with very low counts. You can do this in Partek Flow using the Single cell QA/QC task.
We will start with the protein data.
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
This produces a Single-cell QA/QC task node (Figure 4).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
- Double-click the Single cell QA/QC task node to open the task report
The task report lists the number of counts per cell and the number of detected features per cell in two violin plots. For more information, please see our documentation for the Single cell QA/QC task. For this analysis, we will set a maximum counts threshold to exclude potential protein aggregates and, because we expect every cell to be bound by several antibodies, we will also set a minimum counts threshold.
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
- Click Apply filter to run the Filter cells task
The output is a Filtered single cell counts data node (Figure 6).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Next, we can repeat this process for the Gene Expression data node.
- Click the Gene Expression data node
- Click the QA/QC section in the toolbox
- Click Single Cell QA/QC
- Choose the assembly and annotation used for the gene expression data (Figure 3) from the drop-down menus
- Click Finish
This produces a Single-cell QA/QC task node (Figure 7).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
- Double-click the Single cell QA/QC task node to open the task report
The task report lists the number of counts per cell, the number of detected features per cell, and the percentage of mitochondrial reads per cell in three violin plots. For this analysis, we will set a maximum counts threshold maximum and minimum thresholds for total counts and detected genes to exclude potential doublets and a maximum mitochondrial reads percentage filter to exclude potential dead or dying cells.
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
- Click Apply filter to run the Filter cells task
The output is a Filtered single cell counts data node (Figure 9).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Normalization
After excluding low-quality cells, we can normalize the data.
We will start with the protein data. We will normalize this data using Centered log-ratio (CLR). CLR was used to normalize antibody capture protein counts data in the paper that introudced CITE-Seq (Stoeckius et al. 2017) and in subsequent publications on similar assays (Stoeckiius et al. 2018, Mimitou et al. 2018). CLR normalization includes the following steps: Add 1, Divide by Geometric mean, Add 1, log base e.
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Normalization produces a Normalized counts data node on the Antibody Capture branch of the pipeline.
Next, we can normalize the mRNA data. We will use the recommended normalization method in Partek Flow, which accounts for differences in library size, or the total number of UMI counts, per cell and log transforms the data. To match the CLR normalization used on the Antibody Capture data, we will use a log e transformation instead of the default log 2.
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Normalization produces a Normalized counts data node on the Gene Expression branch of the pipeline (Figure 12).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
|
Merge Protein and mRNA data
For quality filtering and normalization, we needed to have the two data types separate as the processing steps were distinct, but for downstream analysis we want to be able to analyze protein and mRNA data together. To bring the two data types back together, we will merge the two normalized counts data nodes.
- Click the Normalized counts data node on the Antibody Capture branch of the pipeline
- Click the Single cell counts data node
- Click the Pre-analysis tools section of the toolbox
- Click Merge matrices
- Click Select data node to launch the data node selector
Data nodes that can be merged with the Antibody Capture branch Normalized counts data node are shown in color (Figure 13).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
- Click the Normalized counts data node on the Gene Expression branch of the pipeline
A black outline will appear around the chosen data node.
- Click Select
- Click Finish to run the task
The output is a Merged counts data node (Figure 14). This data node will include the normalized counts of our protein and mRNA data. The intersection of cells from the two input data nodes is retained so only cells that passed the quality filter for both protein and mRNA data will be included in the Merged counts data node.
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Collapsing tasks to simplify the pipeline
To simplify the appearance of the pipeline, we can group task nodes into a single collapsed task. Here, we will collapse the filtering and normalization steps.
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Tasks that can for the beginning and end of the collapsed section of the pipeline are highlighted in purple (Figure 16). We have chosen the Split matrix task as the start and we can choose Merge matrices as the end of the collapsed section.
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
- Click Merge matrices to choose it as the end of the collapsed section
The section of the pipeline that will form the collapsed task is highlighted in green.
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
To view the tasks in Data processing, we can expand the collapsed task.
- Double-click Data processing to expand it
When expanded, the collapsed task is shown as a shaded section of the pipeline with a title bar (Figure 19).
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
To re-collapse the task, you can double click the title bar or click the icon in the title bar. To remove the collapsed task, you can click the . Please note that this will not remove tasks, just the grouping.
- Double-click the Data processing title bar to re-collapse
Choosing the number of PCs
In this data set, we have two data types. We can choose to run analysis tasks on one or both of the data types. Here, we will run PCA on only the mRNA data to find the optimal number of PCs for the mRNA data.
- Click the Merged counts node
- Click Exploratory analysis in the task menu
- Click PCA
Because we have multiple data types, we can choose which we want to use for the PCA calculation.
- Click Gene Expression for Include features where "Feature type" is
- Click Configure to access the advanced settings
- Click Generate PC quality measures
This will generate a Scree plot, which is useful for determining how many PCs to use in downstream analysis tasks.
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
A PCA task node will be produced.
- Double-click the PCA task node to open the PCA task report
The PCA task report includes the PCA plot, the Scree plot, the component loadings table, and the PC projections table. To switch between these elements, use the buttons in the upper right-hand corner of the task report . Each cell is shown as a dot on the PCA scatter plot.
- Click to open the Scree plot
The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering and t-SNE.
...
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
In this data set, a reasonable cut-off could be set anywhere between around 10 and 30 PCs. We will use 15 in downstream steps.
Graph-based clustering
Prior to beginning, transfer this file to your Partek Flow using the Transfer files button on the homepage.
Page Turner | ||
---|---|---|
|
Additional assistance |
---|
Rate Macro |
---|