Partek Flow Documentation

Page tree
Skip to end of metadata
Go to start of metadata

This guide illustrates how to process FASTQ files to obtain a Single cell counts data node, which is the starting point for analysis of single-cell RNA-seq experiments (such as shown in this tutorial). The guide is written for FASTQ files produced using the 10x Genomics Chromium™ Single Cell 3' v2 library prep kit or the Drop-seq / Dolomite Bio prep kit.

We recommend uploading your FASTQ files (fastq.gz) to a folder on the Partek® Flow® server before importing them into a project.

Adding a prep kit

Before processing your FASTQ files, you need to add a prep kit for your single cell RNA-seq library preparation method. Partek distributes prep kit files for the 10x Genomics Chromium™ Single Cell 3' v2, 10x Genomics Chromium™ Single Cell 3' v3, Drop-seq, and Fluidigm C1 single cell technologies. 

  • Click Settings (Figure 1)

Figure 1. Navigate to settings

  • Click Library file management in the list of Partek Flow components
  • Click the Prep kit files tab
  • Click Add prep kit (Figure 2)

Figure 2. Adding a prep kit in the Library file management page

Option 1: 10x Genomics Chromium™ Single Cell 3' v2

  • Choose 10x Genomics Chromium™ Single Cell 3' v2 from the drop-down menu (Figure 3)
  • Click Create

Figure 3. Partek distributed prep kits are listed in the drop-down menu
The prep kit will be added to the table (Figure 4).

 

Figure 4. The prep kit has been added to the Prep kit files tab

Option 2: Drop-seq or Dolomite Bio

  • Choose Drop-seq from the drop-down menu
  • Click Create (Figure 5)

Figure 5. Partek distributed prep kits are listed in the drop-down menu
The prep kit will be added to the table (Figure 6).

 

Figure 6. The prep kit has been added to the Prep kit files tab

Importing the FASTQ files

  •  To proceed, go back to your project and switch to Data tab. Click Import data (Figure 7)

 
Figure 1. Use the data tab to add data to a new project
  • Click Automatically create samples from files

The file browser interface will open (Figure 8). 

 

 
Figure 2. Navigate the folder containing your FASTQ files

Select the FASTQ files using the file browser interface.

Option 1: 10x Genomics Chromium™ Single Cell 3' v2 Library

Select the _R1 and _R2 files for each sample, but do not select any available _I1 files.

Option 2: Drop-seq or Dolomite Bio

Select the _R1 and _R2 files for each sample. 

 

Paired end reads will be automatically detected and multiple lanes for the same sample will be automatically combined into a single sample.

Samples being imported will appear in the data tab of the project (Figure 9).

 

 
Figure 3. Once the import process has begun, samples will appear in the data tab

To check on import progress, you can click on Queue and choose View queued tasks (Figure 10)


 
Figure 4. You can use the Queue button to access the list of your running or pending tasks

Start and estimated end times are listed in the Queued tasks view (Figure 11). 

 

Figure 5. The Queue lists waiting and running tasks
  • Return to the project. 

When the FASTQ files have finished importing, the Unaligned reads data node will turn from transparent to opaque.

Trimming tags

The first task to run after importing FASTQ files is Trim tags. This task identifies the barcode and UMI of each paired read. This allows barcodes and UMIs to be used in UMI deduplication and barcode filtering, but be excluded from alignment. 

  • Click the Unaligned reads data node
  • Click Pre-alignment tools in the toolbox
  • Click Trim tags (Figure 12)

Figure 6. Choose Trim tags from the Pre-alignment tools section of the toolbox

In the Trim tags dialog you can choose your library prep kit from the drop-down menu.

Option 1: 10x Genomics Chromium™ Single Cell 3' v2 Library

  • Choose 10x Chromium Single Cell 3' v2

Option 2: Drop-seq or Dolomite Bio

  • Choose Drop-seq

 

You can choose to keep reads that cannot be trimmed by selecting Keep untrimmed and you can choose to save storage space by selecting Output indexed unaligned reads

By default, untrimmed reads are not retained and trimmed reads are indexed. 

  • Click Finish to run the task (Figure 13)

Figure 7. Choose your Prep kit from the drop-down menu

The Trim tags task outputs a Trimmed reads data node.

Align trimmed reads to a reference genome

After trimming, the next step is to align the reads to a reference genome.

  • Click Trimmed reads
  • Click Aligners in the toolbox 
  • Click STAR (Figure 14)

Figure 8. STAR is a fast and accurate aligner appropriate for RNA-seq data
  • Choose your species and reference genome using the STAR aligner dialog. Here, we chose hg19 (Figure 15) 

 

Figure 9. Choose your reference genome and set STAR preferences
  • Click Finish to run the task

Alignment generates an Aligned reads data node. You may want to check alignment quality using the Post-alignment QA/QC task. 

Deduplicate UMIs

After alignment, PCR artifacts are removed using Deduplicate UMIs. 

  • Click Aligned reads
  • Click Post-alignment tools in the toolbox 
  • Click Deduplicate UMIs (Figure 16)

Figure 10. Running Deduplicate UMIs removes PCR artifacts


Option 1: 10x Genomics Chromium™ Single Cell 3' v2 Library

  • Choose Retain only one alignment per UMI (Figure 17)

By choosing this option, the deduplication process in Partek Flow conforms to the default parameters for UMI deduplication in CellRanger by 10x Genomics. Selecting this option requires that each alignment must be compatible with exactly one gene and retains only one aligned read per UMI. Leaving it unselected allows multiple aligned reads to be retained for each UMI if they align to different sequences. 

  • Choose a gene/feature annotation using the drop-down menu. Here, we chose Ensembl Transcripts release 75 (Figure 16)

If you select Retain only one alignment per UMI, you will be prompted to choose a Gene/feature annotation. Please choose the same annotation you plan to use for quantification.

 

Figure 11. Configure preferences for Deduplicate UMIs
  • Click Finish to run the task

A new data node, Deduplicated reads, will be generated. 

Option 2: Drop-seq or Dolomite Bio

Selecting Retain only one alignment per UMI requires an alignment be compatible with exactly one gene and retains only one aligned read per UMI. Leaving Retain only one alignment per UMI unselected allows multiple aligned reads to be retained for each UMI if they align to different sequences. If you select Retain only one alignment per UMI, you will be prompted to choose a Gene/feature annotation. Please choose the same annotation you plan to use for quantification.

  • Choose Retain only one alignment per UMI or leave it unselected
  • Click Finish to run the task

A new data node, Deduplicated reads, will be generated. 

Filtering barcodes

While ideally each barcode would correspond to a cell, in reality, many barcodes are added to gems or droplets with no cells. In the next step, we will filter the sequenced barcodes to include only barcodes that correspond to gems or droplets that contained cells. 

  • Double-click the Deduplicated reads data node to open its task report. 

The task report of the Deduplicated reads data node is a plot (Figure 18). Barcodes are ordered on the X-axis by the number reads so that the barcode closest to the Y-axis has the most reads and the barcode furthest from the Y-axis has the fewest reads. The Y-axis value is the number of reads corresponding to each barcode. This type of plot is commonly referred to as a knee plot. 

This plot is used to choose a cutoff point between barcodes that correspond to cells and barcodes that do not. Partek Flow automatically calculates a cutoff point, shown by the vertical line on the graph. Barcodes designated as cells are shown in blue while barcodes designated as without cells (background) are shown in grey. This cutoff can be adjusted by dragging the vertical line across the graph or by using the text fields in the Filter panel on the left-hand side of the plot. Using the Filter panel, you can specify the number of cells or the percentage of reads in cells and the cutoff point will be adjusted to match your criteria. The number of cells and the percentage of reads in cells is adjusted as the cutoff point is changed. To return to the automatically calculated cutoff, click Reset sample filter

The percentage of reads in cells and the median reads per cell are useful technical quality metrics that can be consulted when optimizing sample handling and cell isolation techniques. 

The Deduplication summary section indicates the number of initial and deduplicated alignments. 

 

Figure 12. The Deduplication summary helps choose a cutoff between barcodes corresponding to cells and background

One knee plot is generated for each sample. In projects with multiple samples, Next and Back buttons will appear at the top to enable navigation between sample knee plots. Manual filters must be set separately for each sample. 

To view a summary of the currently selected filter settings for all samples, click Summary table. This opens a table showing key metrics for each sample in the project (Figure 19).

 

Figure 13. Deduplication stats for each sample are listed in the summary table

To apply the currently selected filter for all samples, use the Apply filter button on either the Knee plot of any sample or on the Summary table. 

  • Click Apply filter to run the Filter barcodes task

Filter barcodes produces a Filtered reads data node. 

Quantifying barcodes

After deduplicating UMIs and filter barcodes, we can generate a single cell count matrix by quantifying the reads for each barcode to an gene/feature annotation model. 

  • Click the Filtered reads data node
  • Click Quantification in the toolbox
  • Click Quantify barcodes (Figure 20)

 

Figure 14. Quantify barcodes to annotation model generates a single cell counts data node
  • Choose a Gene/feature annotation from the drop-down menu, here we chose Ensembl Transcripts release 75 
  • Click Finish to run quantification (Figure 21)

Figure 15. Choose a Gene/feature annotation and configure quantification

A Single cell counts data node will be generated (Figure 22).

 

Figure 16. Pipeline from Unaligned reads to Single cell counts

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 19 rates

  • No labels