View Source

Principal component analysis (PCA) is a way to explore the overall similarity between samples, visualize possible groupings within the data set, and detect outliers.

Select PCA Scatter Plot from the QA/QC

Genomics Suite Documentation > Exploring the data with PCA > image2017-11-17 14:11:30.png

Each dot on the plot corresponds to a single sample and can be thought of as a summary of all normalized marker intensities for the sample. The first categorical column is used to color the plot; here, tumor samples are shown in red and normal samples are shown in blue.

To better view the data, we can rotate the plot.

Select to activate Rotate Mode
Click and drag to rotate the plot

Rotating the plot allows us to look for outliers in the data on each of the three principal components (PC1-3). The percentage of the total variation explained by each PC is listed by its axis label. The chart label shows the sum percentage of the total variation explained by the displayed PCs.

We can see that the peripheral blood samples (normal) cluster together whereas the cancer tissue samples (tumor) are more dispersed and show considerable variability. This corresponds well with the known genomic variability of cancer cells.

To view the similarity of paired normal and tumor samples from the same patient, we can connect dots by Subject ID.

Select 4. SubjectID from the Connect by drop-down menu in the upper right-hand corner of the plot tab

Paired tumor and normal samples are now connected by lines, illustrating the range of differences between normal and tumor copy number in the data set (Figure 2).

Genomics Suite Documentation > Exploring the data with PCA > image2017-11-17 14:12:0.png