Join us for a webinar: The complexities of spatial multiomics unraveled
May 2

Partek Flow Documentation

Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Filtering cells

An important step in analyzing single cell RNA-Seq data is to filter out low quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed. 

  • Click on the Single cell data node
  • Click on the QA/QC section of the task menu
  • Click on Single cell QA/QC

A task node, Single cell QA/QC, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Single cell QA/QC task node to indicate that the task is running. 

  • Click the Single cell QA/QC node once it finishes running
  • Click Task report in the task menu 

The Single cell QA/QC report includes interactive violin plots showing the value of every cell in the project on several quality measures. 

There are three plots: number of UMI counts per cell, number of detected genes per cell, and the percentage of mitochondrial counts per cell.

Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric. Cells can be filtered either by clicking and dragging to select a region on one of the plots or by setting thresholds using the filters below the plots. Here, we will apply a filter for the number of read counts.

The plot will be shaded to reflect the filter. Cells that are excluded will be shown as black dots on both plots. 

The UMI counts per cell and number of detected genes per cell are typically used to filter out potential doublets - if a cell as an unusually high number of total UMIs or detected genes, it may be a doublet. The mitochondrial counts percentage can be used to identify cells damaged during cell isolation - if a cell has a high percentage of mitochondrial counts, it is likely damaged or dying and may need to be excluded. 

Filtering genes 

A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes. Because there is no gold standard for what makes a gene informative or not and ideal gene filtering criteria depend on your experimental design and research question, Partek Flow has a wide variety of flexible filtering options. 

  • Click the Single cell data node produced by the Filter cells task
  • Click Filtering in the task menu
  • Click Filter features 

There are three categories of filter available - noise reduction, statistics based, and feature list. 

The noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The statistics based filter is useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The feature list filter allows you to filter your data set to include or exclude particular genes.

 We will use a noise reduction filter to exclude genes that are not expressed by any cell in the data set, but were included in the matrix file.

  • Click the Noise reduction filter check box 
  • Set the Noise reduction filter to Exclude features where value == 0 in 100% of cells using the drop-down menus and text boxes
  • Click Finish to apply the filter

This produces a Filtered counts data node. This will be the starting point for the next stage of analysis - identifying cell types in the data using the interactive t-SNE plot. 

Normalization

Because different cells will have a different number of total UMIs, it is important to normalize the data prior to downstream analysis. For droplet-based single cell isolation and library preparation methods that use a 3' counting strategy, where only the 3' end of each transcript is captured and sequenced, we recommend the following normalization -   1. CPM (counts per million), 2. Add 1, 3. Log2. This accounts for differences in total UMI counts per cell and log transforms the data, which makes the data easier to visualize. 

  • Click the Filtered counts node produced by the Filtered counts task
  • Click Normalization and scaling in the task menu
  • Click Normalization 
  • Click  to add the recommended normalization scheme 

This adds CPM (counts per million), Add 1, and Log2 to the Normalization order panel. Normalization steps are performed in descending order. 

  • Click Finish to apply the normalization

A new Normalized counts data node will be produced.

For more information on normalizing data in Partek Flow, please see the Normalize Counts section of the user manual.  

Scaling 

For some data sets, it may be necessary to remove technical artifacts or batch effects. To do this, you can use the Scaling task in the Normalization and Scaling section. The scaling task is detailed in our Single Cell Scaling white paper. We will not perform scaling for this data set. 

PCA

Principal components (PC) analysis (PCA) is an exploratory technique that is used to describe the structure of high dimensional data by reducing its dimensionality. Because PCA is used to reduce the dimensionality of the data prior to clustering as part of a standard single cell analysis workflow, it is useful to examine the results of PCA for your data set prior to clustering. 

  • Click the Normalized counts node 
  • Click Exploratory analysis in the task menu
  • Click PCA

You can choose Features contribute  equally  to standardize the genes prior to PCA or allow more variable genes to have a larger effect on the PCA by choosing by variance. By default, we take variance into account and focus on the most variable genes. 

If you have multiple samples, you can choose to run PCA for each sample individually or for all samples together using the Split cells by sample option. 

  • Click Configure to access the advanced settings
  • Click Generate PC quality measures 

This will generate a Scree plot and a PC component loadings table, which are useful for determining how many PCs to use for downstream analysis tasks. 

  • Click Apply
  • Click Finish to run 

A new PCA task node will be produced.

  • Double-click the PCA task node to open the PCA task report

The PCA task report includes 


 

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 6 rates

  • No labels