...
- Select Plot PCA Scatter Plot from the QA/AC section of the Gene Expression workflow. A Scatter Plot tab containing your PCA plot will open (Figure 1)
Numbered figure captions |
---|
SubtitleText | PCA Scatter Plot tab |
---|
AnchorName | PCA Scatter Plot |
---|
|
|
...
- In the Scatter Plot tab, select the Rendering Properties icon () and configure the plot as shown (Figure 2)
- Color the points by column 4. Tissue and Size the points by column 3. Type
- Select OK
Numbered figure captions |
---|
SubtitleText | Configuring the PCA scatter plot: Color by Tissue, size by Type |
---|
AnchorName | Plot Rendering Properties |
---|
|
|
Notice now that the data are clustered by different tissues (
Figure 3).
Numbered figure captions |
---|
SubtitleText | PCA scatter plot configured with color by Tissue, size by Type |
---|
AnchorName | Configured PCA Scatter Plot |
---|
|
|
Another way to see the cluster pattern is to put an ellipse around the
Tissue groups.
- Open the Plot Rendering Properties dialog and select the Ellipsoids tab
- Select Add Ellipse/Ellipsoid
- Select Ellipse in the Add Ellipse/Ellipsoid... dialog
- Double click on Tissue in the Categorical Variable(s) panel to move it to the Grouping Variable(s) panel (Figure 4)
- Select OK to close the Add Ellipse/Ellipsoid... dialog and select OK again to exit the Plot Rendering Properties dialog
Numbered figure captions |
---|
SubtitleText | Adding Ellipses to PCA Scatter Plot |
---|
AnchorName | Adding Ellipse |
---|
|
|
By rotating this PCA plot, you can see that the data is separated by tissues, and within some of the tissues, the Down syndrome samples and normal samples are separated. For example, in the
Astrocyte and
Heart tissues, the Down syndrome samples (small dots) are on the left, and the normal samples (large dots) are on the right (
Figure 5).
Numbered figure captions |
---|
SubtitleText | PCA scatter plot with ellipses, rotated to show separation by Type |
---|
AnchorName | PCA Scatter Plot with Ellipses |
---|
|
|
PCA is an example of exploratory data analysis and is useful for identifying outliers and major effects in the data. From the scatter plot, you can see that the tissue is the biggest source of variation. There are many genes that express differently between the 4 tissues, but not as many genes that express differently between type (Down syndrome and normal) across the whole chip (genome).
The next step is to draw a histogram to examine the samples. Select Plot Sample Histogram in the QA/QC section of the Gene Expression workflow to generate the Histogram tab (Figure 6).
Numbered figure captions |
---|
SubtitleText | Histogram tab |
---|
AnchorName | Histogram |
---|
|
|
The histogram plots one line for each of the samples with the intensity of the probes graphed on the X-axis and the frequency of the probe intensity on the Y-axis. This allows you to view the distribution of the intensities to identify any outliers. In this dataset, all the samples follow the same distribution pattern indicating that there are no obvious outliers in the data. As demonstrated with the PCA plot, if you click on any of the lines in the histogram, the corresponding row will be highlighted in the spreadsheet
1 (
Down_Syndrome-GE). You can also change the way the histogram displays the data by clicking on the
Plot Properties button. Explore these options on your own.
...