Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

An important consideration when analyzing UMI data are the errors introduced into the UMIs themselves during PCR amplification of the original molecule. If these errors are not accounted for and each sequenced UMI is considered to be representative of the original UMI, the number of unique molecules can be significantly overestimated. To account for this, the Deduplicate UMIs task uses an implementation of the UMI-tools algorithm described in Smith et al. 2017. Paired-end read support was further improved by incorporating components of the deduplication tool Connor.  

The task works by first partitioning reads into groups. Reads are grouped if they align to the same genomic position, have the same strandness, and any barcodes present match within an edit distance of two.

...

Deduplicate UMIs has an alternative setting to more closely match methods used by CellRanger: Retain 10X Genomics' Cell Ranger pipeline: retain only one alignment per UMI. Selecting this option changes how the task functions and requires that you specify the genome assembly and gene/feature annotation.

...

Smith T, Hegar A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Research 2017; 27(3): 491-499. https://doi.org/10.1101/gr.209601.116 

Connor, University of Michigan BRCF Bioionformatics Core https://pypi.org/project/Connor/

Cell Ranger Algorithms Overview, 10x 10X Genomics https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/overview

...