View Source

Creating a new project and importing the tutorial data set

The tutorial data set is available through Partek® Flow®.

Click your avatar (Figure 1)

Flow Documentation > Getting started with the tutorial data set > image2018-3-20 9:29:56.png

Click Settings

On the System information page, the Download tutorial data section includes pre-loaded data sets used by Partek Flow tutorials (Figure 2).

Flow Documentation > Getting started with the tutorial data set > image2018-3-20 9:26:22.png

Click Single cell glioma (multi-sample)

The tutorial data set will be downloaded onto your Partek Flow server and a new project, Glioma (multi-sample), will be created. You will be directed to the Data tab of the new project. Because this is a tutorial project, there is no need to click on Import data, as the import is handled automatically (Figure 3).

Flow Documentation > Getting started with the tutorial data set > image2018-2-20 16:1:4.png

You can wait a few minutes for the download to complete, or check the download progress by selecting Queue then View queued tasks... to view the Queue (Figure 4).

Flow Documentation > Getting started with the tutorial data set > image2018-3-20 9:29:1.png

Once the download completes, the sample table will appear in the Data tab (Figure 5).

Flow Documentation > Getting started with the tutorial data set > image2018-2-15 9:42:27.png

Annotating samples with attributes

The Data tab displays the samples in the project with the number of cells in each sample (Figure 5). One of the goals of this analysis will be to compare gene expression in a cell type between the two Glioma subtypes. For this, we need to add an annotation indicating the subtype of each sample.

Click Manage attributes
Click Add new attribute (Figure 6)

Flow Documentation > Getting started with the tutorial data set > image2018-1-24 10:51:7.png

Type Subtype in the Name text field
Click Add (Figure 7)

Flow Documentation > Getting started with the tutorial data set > image2018-1-24 10:52:42.png

Type Astrocytoma in the New category text field
Click Add
Type Oligodendroglioma in the New category text field
Click Close
Click Back to sample management table

There is new column, Subtype, in the Data tab, but every sample has a value of N/A. Next, we will assign each sample to a subtype.

Click Edit attributes
Use the drop-down menus to assign each sample to its corresponding subgroup (Figure 8)

Sample Name Subtype
MGH36 Oligodendroglioma
MGH42 Astrocytoma
MGH45 Astrocytoma
MGH53 Oligodendroglioma
MGH54 Oligodendroglioma
MGH56 Astrocytoma
MGH60 Oligodendroglioma
MGH64 Astrocytoma

Flow Documentation > Getting started with the tutorial data set > image2018-2-15 9:47:0.png

Once each sample has been assigned to a subgroup, click Apply changes to proceed

Filtering cells in single cell RNA-Seq data

With samples imported and annotated, we can begin analysis.

Click Analyses to switch to the Analyses tab

For now, the Analyses tab has only a single node, Single cell data. As you perform the analysis, additional nodes representing tasks and new data will be created, forming a visual representation of your analysis pipeline.

Click on the Single cell data node

A context-sensitive menu will appear on the right-hand side of the pipeline (Figure 9). This menu includes tasks that can be performed on the selected data node.

Flow Documentation > Getting started with the tutorial data set > image2018-1-24 10:56:42.png

An important step in analyzing single cell RNA-Seq data is to filer out low quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed.

Click on QA/QC section of the task menu
Click on Single cell QA/QC (Figure 10)

Flow Documentation > Getting started with the tutorial data set > image2018-2-15 10:26:18.png

A task node, Single cell QA/QC, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Single cell QA/QC task node to indicate that the task is running.

Click the Single cell QA/QC node once it finishes running
Click Task report on the task menu (Figure 11)

Flow Documentation > Getting started with the tutorial data set > image2018-2-15 10:29:37.png

The Single cell QA/QC report includes interactive violin plots showing the value of every cell in the project on several quality measures (Figure 12).

Flow Documentation > Getting started with the tutorial data set > image2018-2-15 10:36:1.png

For this data set, there are two plots: number of reads per cell and number of detected genes per cell. Typically, there is a third plot showing the percentage of mitochondrial reads per cell, but mitochondrial transcripts were not included in the data set by the study authors.

Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric. Cells can be filtered either by clicking and dragging to select a region on one of the plots or by setting thresholds using the filters below the plots. Here, we will apply a filter for the number of read counts.

Set the Read counts filter to Keep cells between 8000 and 20500 reads

The plot will be shaded to reflect the filter. Cells that are excluded will be shown as black dots on both plots (Figure 13).

Flow Documentation > Getting started with the tutorial data set > image2018-2-15 10:38:46.png

Because this data set was already filtered by the study authors to include only high-quality cells, this read counts filter is sufficient.

Click Apply filter

A new task, Filter cells, is added to the Analyses tab. This task produces a new Single cell data node (Figure 14).

Flow Documentation > Getting started with the tutorial data set > image2018-2-15 10:40:29.png

Most tasks can be queued up on data nodes that have not yet been generated, so you can wait for filtering step to complete, or proceed to the next section.

Filtering genes in single cell RNA-Seq data

A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes. Because there is no gold standard for what makes a gene informative or not and ideal gene filtering criteria depend on your experimental design and research question, Partek Flow has a wide variety of flexible filtering options.

Click the Single cell data node produced by the Filter cells task
Click Filtering in the task menu
Click Filter features (Figure 15)

Flow Documentation > Getting started with the tutorial data set > image2018-2-15 10:42:11.png

There are three categories of filter available - noise reduction, statistics based, and feature list (Figure 16).

Flow Documentation > Getting started with the tutorial data set > image2018-3-20 15:22:43.png

The noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The statistics based filter is useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The feature list filter allows you to filter your data set to include or exclude particular genes.

We will use a noise reduction filter to exclude genes that are not expressed by any cell in the data set, but were included in the matrix file.

Click the Noise reduction filter check box
Set the Noise reduction filter to Exclude features where value <= 0 in 99% of cells using the drop-down menus and text boxes (Figure 16)
Click Finish to apply the filter

Flow Documentation > Getting started with the tutorial data set > image2018-3-20 15:22:3.png

This produces a Filtered counts data node. This will be the starting point for the next stage of analysis - identifying cell types in the data using the interactive t-SNE plot.

Normalizing single cell RNA-Seq data

We are omitting normalization in this tutorial because the data has already been normalized.

The tutorial data set is taken from a published study and has already been normalized using TPM (Transcripts per million), which normalizes for length of feature and total reads, and transformed as log2(TPM/10+1). This normalization and transformation scheme can be performed in Partek Flow, along with other commonly used RNA-Seq data normalization methods.

For more information on normalizing data in Partek Flow, please see the Normalize Counts section of the user manual.

Sample Name	Subtype
MGH36	Oligodendroglioma
MGH42	Astrocytoma
MGH45	Astrocytoma
MGH53	Oligodendroglioma
MGH54	Oligodendroglioma
MGH56	Astrocytoma
MGH60	Oligodendroglioma
MGH64	Astrocytoma