TF-IDF normalization

Latent semantic indexing (LSI) was first introduced for the analysis of scATAC-seq data by Cusanovich et al. 2018[1]. LSI combines steps of frequency-inverse document frequency (TF-IDF) normalization followed by singular value decomposition (SVD). Partek Flow wraps Signac's TF-IDF normalization for single cell ATAC-seq dataset. It is a two-step normalization procedure that both normalizes across cells to correct for differences in cellular sequencing depth, and across peaks to give higher values to more rare peaks[2].

TF-IDF normalization in Flow can be invoked in Normalization and scaling section by clicking any single cell counts data node (Figure 1).

Figure 1. Scran deconvolution task in Normalization and scaling section in Flow.

To run Scran deconvolution,

Click a single cell counts data node
Click the Normalization and scaling section in the toolbox
Click Scran deconvolution

The GUI is simple and easy to understand. The first Scran deconvolution dialog is asking to select the cluster name from a drop-down list that includes all the attributes for this dataset. The selected cluster is an optional factor specifying which cells belong to which cluster, for deconvolution within clusters (Figure 2). Simply click the Finish button if you want to run the task as default.

Figure 2. Interface of Scran deconvolution task in Partek Flow. Example attributes are indicated in the drop-down list if Cluster name is checked.

The output of Scran deconvolution is a new data node that has been normalized by the pool-based size factors of each cell and log2 transformed. We can then use this new normalized matrix for downstream analysis and visualization (Figure 3).

Figure 3. Example workflows to demonstrate downstream analysis and visualization of Scran deconvolution output.

Other parameters in this task that you can adjust include:

Pool size: A numeric vector of pool sizes, i.e., number of cells per pool.

Max cluster size: An integer scalar specifying the maximum number of cells in each cluster.

Enforce positive estimates: A logical scalar indicating whether linear inverse models should be used to enforce positive estimates.

Scaling factor: A numeric scalar containing scaling factors to adjust the counts prior to computing size factors.

References

Lun, A. T., K. Bach, and J. C. Marioni. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 2016.

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0947-7

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Partek Flow Documentation

Page tree

References

Additional Assistance