SVD

To analyze scATAC-seq data, Partek Flow introduced a new technique - LSI (latent semantic indexing ). LSI combines steps of frequency-inverse document frequency (TF-IDF) normalization followed by singular value decomposition (SVD). This returns a reduced dimension representation of a matrix. Although SVD and Principal components analysis (PCA) are two different techniques, the SVD has a close connection to PCA . PCA is simply an application of the SVD. For users who are more familiar with scRNA-seq, you can think of SVD as analogous to the output of PCA. And similarly, the statistical interpretation of singular values is in the form of variance in the data explained by the various components. The singular values produced by the SVD are in order from largest to smallest and when squared are proportional the amount of variance explained by a given singular vector.

SVD task in Flow can be invoked in Normalization and scaling section by clicking any single cell counts data node (Figure 1). We recommend running SVD on normalized data, particularly TF-IDF normalized counts for scATAC-seq analysis.

Figure 1. SVD task in Flow

The PCA task creates a new task node, and to open it and see the result, do one of the following: select the PCA task node, proceed to the context sensitive menu and go to the Task result; or double-click on the PCA task node. The report containing eigenvalues, PC projections, component loadings, and mapping error information for the first three PCs.

When open PCA node in Data viewer, by default, it is 3D scatterplot (Figure 2), each dot on the plot represents an observation, while the first three PCs are shown on the X-, Y-, and Z-axis respectively, with the information content of an individual PC is in the parenthesis.

As an exploratory tool, PCA scatterplot is applied to view any groupings in the data set and generate hypotheses based on the outcome, or to spot possible outliers.

Figure 2. Principal components analysis plot in 3D. Each dot is a sample. The axes show the first three principal components, with the fraction of explained variance in the parenthesis. The legend is on the right, showing effect of coloring and sizing on the appearance of the dots (an example)

To rotate the plot left click & drag. To zoom in or out, use the mouse wheel. Click and drag the legend can move the legend to different location on the viewer.

Detailed configuration on PCA plot can be found by clicking Help>How-to videos>Data viewer section.

In Data viewer, when select PCA data node is the Data section on the control panel (left panel), when drag the node to the plot section (Figure 3), you will have option to plot scree plot and tables.

Figure 3. Drag PCA data node to plot

When choose Scree plot icon , it will plot a 2D viewer, X-axis represents PCs, Y-axis represents eigenvalues (Figure 4)

Figure 4. Scree plot

When mouse over on a point on the line, it will display detailed information of the PC. The scree plot shows how much variation each PC represents, so it is often used to determine the number of principal components to keep for downstream analysis (e.g. tSNE, UMAP, graph-base clustering). The "elbow" point of the graph where the eigenvalues seem to level off should be considered as a cutoff point for downstream analysis.

PCA data node can also be draw as tables, when choose Table icon (), it will display the component loadings matrix in the viewer (Figure 5)

Figure 5. Component loadings are the correlation coefficients between the features and PCs.

In the table, each row is a feature, the column represent PCs, the value is the correlation coefficient. The table can be downloaded as text file. In the configuration panel, the data table drop-down list in the content section, there is PCA projects option, change to this option to display the projection table (Figure 6).

Figure 6. PCA project table

In this table, each row is an observation, each column is a PC, the values are the PC scores. The table can be downloaded as text file.

References

Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573-3587.e29. doi:10.1016/j.cell.2021.04.048
https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis.html

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Partek Flow Documentation

Page tree

References

Additional Assistance