Creating a new project and importing the tutorial data set

The tutorial data set is available through Partek Flow. 

On the System information page, the Download tutorial data section includes pre-loaded data sets used by Partek Flow tutorials (Figure 2). 

 

The tutorial data set will be downloaded onto your Partek Flow server and a new project, Glioma (multi-sample), will be created. You will be directed to the Data tab of the new project. Because this is a tutorial project, there is no need to click on Import data, as the import is handled automatically (Figure 3). 

 

You can wait a few minutes for the download to complete, or check the download progress by selecting Queue then View queued tasks... to view the Queue (Figure 4).

 

Once the download completes, the sample table will appear in the Data tab (Figure 5).

Annotating samples with attributes

The Data tab displays the samples in the project - six Astrocytoma and four Oligodendroglioma tumor samples - with the number of cells in each sample (Figure 5). One of the goals of this analysis will be to compare gene expression in a cell type between the two Glioma subtypes. For this, we need to add an annotation indicating the subtype of each sample. 

There is new column, Subtype, in the Data tab, but every samples a value of N/A. Next, we will assign each sample to a subtype. 

Normalizing single cell RNA-Seq data

With samples imported and annotated, we can begin analysis. 

For now, the Analyses tab has only a single node, Single cell data. As you perform the analysis, additional nodes representing tasks and new data will be created, forming a visual representation of your analysis pipeline. 

A context-sensitive menu will appear on the right-hand side of the pipeline (Figure 9). This menu includes tasks that can be performed on the selected data node. 

 

The Normalization task dialog will open with available normalization methods in the left-hand panel and a blank right-hand panel that will list our selected normalization steps in order of operation (Figure 11). 

 

The tutorial data set is taken from a published study and has already been normalized using TPM (Transcripts per million), which normalizes for length of feature and total reads (Wagner et al. 2012). This normalization method is also available in Partek Flow, along with other commonly used RNA-Seq data normalization methods. For more information on TPM and other normalization options, please see the Normalize Counts section of the user manual. In the published study using this data set, after TPM normalization, the authors performed three additional transformations, which we can easily replicate using Partek Flow. 

The normalization dialog is now configured to divide the TPM values of each gene by 10, add 1, then perform a log2 transformation (Figure 12). This will replicate the normalization method in the published study, log2([TPM/10] +1). 

 

Normalize counts task node and a Normalized count data node will be added to the Analyses tab. Initially, the nodes will be semi-transparaent to indicate that they have been queued, but not completed. A progress bar will appear on the Normalize counts task node to indicate that the task is running (Figure 13).

 

Most tasks can be queued up on data nodes that have not yet been generated, so you can wait for normalization step to complete, or proceed to the next section. 

Filtering cells in single cell RNA-Seq data

An important step in analyzing single cell RNA-Seq data is to filer out low-quality cells. These include doublets and cells damaged during cell isolation. 

One metric for analyze cell quality is the percentage of mitochondrial reads. If a cell has a high percentage of mitochondrial reads, it is likely undergoing apoptosis and should be excluded from analysis. To calculate the mitochondrial reads percentage, the counts matrix needs to be associated with a relevant genome assembly and a gene/feature annotation with mitochondrial transcripts (Ensembl or GENCODE).  

 

Filtering genes in single cell RNA-Seq data

A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes. Because there is no gold standard for what makes a gene informative or not and ideal gene filtering criterea depend on your experimental design and research question, Partek Flow has a wide variety of flexible filtering options.