Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

An important consideration when analyzing UMI data are the errors introduced into the UMIs themselves during PCR amplification of the original molecule. If these errors are not accounted for and each sequenced UMI is considered to be representative of the original UMI, the number of unique molecules can be significantly overestimated. To account for this, the Deduplicate UMIs task uses an implementation of the UMI-tools algorithm described in Smith et al. 2017. Paired-end read support was further improved by incorporating components of the UMI deduplication tool Connor.  

The task works by first partitioning reads into groups. Reads are grouped if they align to the same genomic position, have the same strandness, and any barcodes present match within an edit distance of two.

...

Deduplicate UMIs has an alternative setting to more closely match methods used by CellRanger: Retain 10X Genomics' Cell Ranger pipeline: retain only one alignment per UMI. Selecting this option changes how the task functions and requires that you specify the genome assembly and gene/feature annotation.

The algorithm first checks whether each aligned read is compatible with a transcript in the annotation file. Here, compatible is defined as 50% or more of the aligned read sequence overlapping the transcript; strand is not considered. Aligned reads that are not compatible with a transcript or are mapped to multiple transcripts gene annotations are discarded. 

The occurrence of each barcode and UMI combination is counted to establish the prevalence of each barcode+UMI.

...

This method is also similar to the default method in the Drop-seq cookbookAlignment Cookbook (Macosko et al. 2015), which collapses UMI barcodes with a Hamming distance of 1. 

This method may output more UMIs than the default behavior as only UMIs within an edit distance of 1 are summarized, whereas UMIs with a greater distance can be linked in the UMI-tools method. For a comparison of the performance of the two approaches, please see the Adjacency (CellRangerCell Ranger) and Directional (UMI-tools) methods described in Smith et al. 2017. 

...

Smith T, Hegar A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Research 2017; 27(3): 491-499. https://doi.org/10.1101/gr.209601.116 

Connor, University of Michigan BRCF Bioionformatics Core https://github.com/umich-brcf-bioinf/Connor

Cell Ranger Algorithms Overview, 10x 10X Genomics https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/overview

Drop-seq Alignment Cookbook v1.2 Jan 2016, James Nemesh, Steve McCarroll’s lab, Harvard Medical School  http://mccarrolllab.com/wp-content/uploads/2016/03/Drop-seqAlignmentCookbookv1.2Jan2016.pdf

Macosko E, Basu A, Satija R, et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 2015; 161(5):1202-1214. https://doi.org/10.1016/j.cell.2015.05.002

 

Additional assistance

 

Rate Macro
allowUsersfalse