Getting started with the tutorial data set

Creating a new project and importing the tutorial data set

The tutorial data set is available through Partek Flow.

Click your avatar (Figure 1)

Figure 1. Location of the Settings link on the main page of Partek Flow

Click Settings

On the System information page, the Download tutorial data section includes pre-loaded data sets used by Partek Flow tutorials (Figure 2).

Click Glioma (multi-sample)

The tutorial data set will be downloaded onto your Partek Flow server and a new project, Glioma (multi-sample), will be created. You will be directed to the Data tab of the new project. Because this is a tutorial project, there is no need to click on Import data, as the import is handled automatically (Figure 3).

You can wait a few minutes for the download to complete, or check the download progress by selecting Queue then View queued tasks... to view the Queue (Figure 4).

Once the download completes, the sample table will appear in the Data tab (Figure 5).

Figure 2. Sample data table listing the name and the number of cells for each sample

Annotating samples with attributes

The Data tab displays the samples in the project - six Astrocytoma and four Oligodendroglioma tumor samples - with the number of cells in each sample (Figure 5). One of the goals of this analysis will be to compare gene expression in a cell type between the two Glioma subtypes. For this, we need to add an annotation indicating the subtype of each sample.

Click Manage attributes
Click Add new attribute (Figure 6)

Figure 3. Adding an attribute

Type Subtype in the Name text field
Click Add (Figure 7)

Figure 4. Adding Subtype as an attribute

Type Astrocytoma in the New category text field
Click Add
Type Oligodendroglioma in the New category text field
Click Close
Click Back to sample management table

There is new column, Subtype, in the Data tab, but every samples a value of N/A. Next, we will assign each sample to a subtype.

Click Edit attributes
Use the drop-down menus to assign each sample to its corresponding subgroup (Figure 8)

Figure 5. Assigning samples to subtypes

Once each sample has been assigned to a subgroup, click Apply changes to proceed

Normalizing single cell RNA-Seq data

With samples imported and annotated, we can begin analysis.

Click Analyses to switch to the Analyses tab

For now, the Analyses tab has only a single node, Single cell data. As you perform the analysis, additional nodes representing tasks and new data will be created, forming a visual representation of your analysis pipeline.

Click on the Single cell data node

A context-sensitive menu will appear on the right-hand side of the pipeline (Figure 9). This menu includes tasks that can be performed on the selected data node.

Figure 6. Clicking on a data node opens the context-sensitive task menu

Click on the Normalization and scaling section
Clink on Normalization (Figure 10)

Figure 7. Selecting the Normalization task from the task menu

The Normalization task dialog will open with available normalization methods in the left-hand panel and a blank right-hand panel that will list our selected normalization steps in order of operation (Figure 11).

Figure 8. Read count normalization dialog

The tutorial data set is taken from a published study and has already been normalized using TPM (Transcripts per million), which normalizes for length of feature and total reads (Wagner et al. 2012). This normalization method is also available in Partek Flow, along with other commonly used RNA-Seq data normalization methods. For more information on TPM and other normalization options, please see the Normalize Counts section of the user manual. In the published study using this data set, after TPM normalization, the authors performed three additional transformations, which we can easily replicate using Partek Flow.

Drag Divide by from the left panel to the right panel
Select Custom value from the Divide by drop-down menu
Set the Custom value to 10
Drag Add from the left panel to the right panel
Drag Log from the left panel to the right panel

The normalization dialog is now configured to divide the TPM values of each gene by 10, add 1, then perform a log2 transformation (Figure 12). This will replicate the normalization method in the published study, log2([TPM/10] +1).

Figure 9. Replicating the published normalization method of log2([TPM/10]+1)

Select Finish to perform normalization

A Normalize counts task node and a Normalized count data node will be added to the Analyses tab. Initially, the nodes will be semi-transparaent to indicate that they have been queued, but not completed. A progress bar will appear on the Normalize counts task node to indicate that the task is running (Figure 13).

Figure 10. Queued or running tasks are shown as semi-transparent nodes in the Analyses tab

Most tasks can be queued up on data nodes that have not yet been generated, so you can wait for normalization step to complete, or proceed to the next section.

Filtering cells in single cell RNA-Seq data

An important step in analyzing single cell RNA-Seq data is to filer out low-quality cells. These include doublets and cells damaged during cell isolation.

Click on the Normalized counts data node
Click on QA/QC section of the task menu
Click on Single cell QA/QC (Figure 14)

Figure 11. Invoking the Single Cell QA/QC task

One metric for analyze cell quality is the percentage of mitochondrial reads. If a cell has a high percentage of mitochondrial reads, it is likely undergoing apoptosis and should be excluded from analysis. To calculate the mitochondrial reads percentage, the counts matrix needs to be associated with a relevant genome assembly and a gene/feature annotation with mitochondrial transcripts (Ensembl or GENCODE).

Select Homo sapeins (human) - hg19 from the Assembly drop-down menu
Select Ensembl Transcripts release 75 from the drop-down menu
Click Finish (Figure 15)

Figure 12. Specifying the assembly and annotation for Single-cell QA/QC

A task node, Single cell QA/QC, is produced.

Click the Single cell QA/QC node
Click Task report on the task menu (Figure 16)

Figure 13. Selecting the task report for any task node opens a report with any tables or charts the task produced

The Single cell QA/QC report includes interactive violin plots showing the value of every cell in the project on several quality measures (Figure 17).

Figure 14. Each cell is shown as a point on the plot.

For this data set, there are two plots: number of reads per cell and number of detected genes per cell. Typically, there is a third plot showing the percentage of mitochondrial reads per cell, but mitochondrial transcripts were not included in the data set by the study authors.

Each point on the plots is a cells and the violins illustrate the distribution of cell values. Cells can be filtered either by drawing a gate on one of the plots or by setting thresholds using the filters below the plots. Here, we will apply a filter for the number of read counts.

Set the Read counts filter to Keep cells between 8000 and 20500 reads

The plot will be shaded to reflect the gate. Cells that are excluded will be shown as black dots on both plots (Figure 18).

Figure 15. Previewing a filter using the Single cell QA/QC violin plots

Because this data set was already filtered by the study authors to include only high-quality cells, this read counts filter is sufficient for this tutorial.

Click Apply filter

A new task, Filter cells, is added to the Analyses tab. This task produces a new Single cell data node (Figure 19).

Figure 16. Applying a cell quality filter

For more information about Single Cell QA/QC, please see our user manual section.

Filtering genes in single cell RNA-Seq data

A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes. Because there is no gold standard for what makes a gene informative or not and ideal gene filtering criterea depend on your experimental design and research question, Partek Flow has a wide variety of flexible filtering options.

Click the Single cell data node produced by the Filter cells task
Click Filtering in the task menu
Click Filter features (Figure 20)

Figure 17. Invoking Filter features

There are three categories of filter available - Noise reduction filters, Statitics bsaed filters, and Feature list filters (Figure 21).

The Noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The Statistics based filters are useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The Feature list filter allows you to filter your data set to include or exclude particular genes.

We will use a Noise reduction filter to exclude genes that are not expresed by any cell in the data set, but were included in the matrix file.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating:

Results:

2

rates

Partek Flow Documentation

Page tree