PGS Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Numbered figure captions
SubtitleTextConfiguring the PCA scatter plot: Color by Tissue, size by Type
AnchorNamePlot Rendering Properties

 Notice now that the data are clustered by different tissues (Figure 3).

 

Numbered figure captions
SubtitleTextPCA scatter plot configured with color by Tissue, size by Type
AnchorNameConfigured PCA Scatter Plot

 Another way to see the cluster pattern is to put an ellipse around the Tissue groups.

  • Open the Plot Rendering Properties dialog and select the Ellipsoids tab 
  • Select Add Ellipse/Ellipsoid
  • Select Ellipse in the Add Ellipse/Ellipsoid... dialog 
  • Double click on Tissue in the Categorical Variable(s) panel to move it to the Grouping Variable(s) panel (Figure 4)
  • Select OK to close the Add Ellipse/Ellipsoid... dialog and select OK again to exit the Plot Rendering Properties dialog

Numbered figure captions
SubtitleTextAdding Ellipses to PCA Scatter Plot
AnchorNameAdding Ellipse

 By rotating this PCA plot, you can see that the data is separated by tissues, and within some of the tissues, the Down syndrome samples and normal samples are separated. For example, in the Astrocyte and Heart tissues, the Down syndrome samples (small dots) are on the left, and the normal samples (large dots) are on the right (Figure 5).

 

Numbered figure captions
SubtitleTextPCA scatter plot with ellipses, rotated to show separation by Type
AnchorNamePCA Scatter Plot with Ellipses

 PCA is an example of exploratory data analysis and is useful for identifying outliers and major effects in the data. From the scatter plot, you can see that the tissue is the biggest source of variation. There are many genes that express differently between the 4 tissues, but not as many genes that express differently between type (Down syndrome and normal) across the whole chip (genome).

The next step is to draw a histogram to examine the samples. Select Plot Sample Histogram in the QA/QC section of the Gene Expression workflow to generate the Histogram tab (Figure 6).

 

Numbered figure captions
SubtitleTextHistogram tab
AnchorNameHistogram

 The histogram plots one line for each of the samples with the intensity of the probes graphed on the X-axis and the frequency of the probe intensity on the Y-axis. This allows you to view the distribution of the intensities to identify any outliers. In this dataset, all the samples follow the same distribution pattern indicating that there are no obvious outliers in the data. As demonstrated with the PCA plot, if you click on any of the lines in the histogram, the corresponding row will be highlighted in the spreadsheet 1 (Down_Syndrome-GE). You can also change the way the histogram displays the data by clicking on the Plot Properties button. Explore these options on your own.

The decision to discard any samples would be based on information from the PCA plot, sample histogram plot, and QC metrics. To discard a sample and renormalize the data (without the effects of the outlier), start over with importing samples and omit the outlier sample(s) during the .CEL file import.

...