Page History
Principal components (PC) analysis (PCA) is an exploratory technique that is used to describe the structure of high dimensional data by reducing its dimensionality. It is a linear transformation that converts n original variables (typically: genes or transcripts) into n new variables, which are called PCs, they have three important properties:
- PCs are ordered by the amount of variance explained
- PCs are uncorrelated
- PCs explain all variation in the data
PCA is a principal axis rotation of the original variables that preserves the variation in the data. Therefore, the total variance of the original variables is equal to the total variance of the PCs.
If read quantification (i.e. mapping to a transcript model) was performed by Partek E/M algorithm, PCA can be invoked on a quantification output data node (Gene counts or Transcript counts) or, after normalisation, on a Normalized counts data node. Select a node on the canvas and then PCA in the Visualization section of the toolbox. The PCA task creates a new task node, and to open it and see the result, do one of the following: select the PCA task node, proceed to the toolbox and go to the Task result; or double-click on the PCA task node.
Figure 1:
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Also note that PCA is available for ERCC assessment; open the QA/QC on ERCC controls task node and push View PCA in the lower left corner.
...
As an exploratory tool, PCA scatterplot is applied to view any groupings in the data set and generate hypotheses based on the outcome, or to spot possible outliers.
Figure 2:
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
To rotate the plot left click & drag. To zoom in or out, use the mouse wheel. Click and drag the legend can move the legend to different location on the viewer.
The plot can be customised by using the controls on the left (Figure 3). Color by shows the sample attributes as listed in the Data tab or you can set it to Fixed to have all the dots of the same color. Size by option works in the same way, but affects the dot sizes.
Figure 3:
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Connect by option is particularly useful for dependent study designs, where you can highlight the samples based on the same biological source by the connecting lines. Example on Figure 4 depicts results of a study where each RNA sample was processed by both RNA-seq and gene expression array; the lines connect the same samples.
Figure 4:
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
Figure 5:
Numbered figure captions | ||||
---|---|---|---|---|
| ||||
...
Although first three PCs are shown by default, you can plot any of the first nine PCs, by using the X, Y, and Z drop-down lists.
Once you are pleased with the appearance of the dot plot, push Save image button to save it to the local machine. The resulting dialog (Figure 7) controls the resolution of the image file. The image will be saved in png format, and the default filename is PCA plot.png. Or, if you are not happy with your edits, you can always revert to the initial view by pushing the Reset button.
...