PGS Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel2
minLevel2
excludeAdditional Assistance

This tutorial will illustrate

...

:

  • Import Affymetrix CEL files and check quality
  • Add attributes describing the sample groups
  • Perform exploratory analysis using the PCA scatter plot
  • Find differentially expressed genes using ANOVA
  • Generate a list of genes of interest
  • Add annotations to the gene list

 

Note: the workflow described below is enabled in Partek Genomics Suite (PGS) version 7.0. Please contact the Partek Licensing Team at licensing@partek.com to request this version or update the software release via Help > Check for Updates from the main command line. The screenshots shown below may vary across platforms and across different versions of PGS.

 

Children List


Pgs7 note



Description of the Data Set

Down syndrome is caused by an extra copy of all or part of chromosome 21; it is the most common non-lethal trisomy in humans. The At the time of the study used in this tutorial, conflicting reports had thrown into doubt whether individuals with Down syndrome have dysregulation of gene expression throughout the genome or primarily in genes from chromosome 21. To address this question, Affymetrix GeneChip™ Human U133A arrays were used to assay 25 samples taken from 10 human subjects, with or without Down syndrome, and 4 different tissues. The data revealed a significant upregulation of chromosome 21 genes at the gene expression level in individuals with Down syndrome; this dysregulation was largely specific to chromosome 21 only and not to any other chromosomes. This experiment was performed using the Affymetrix GeneChip™ Human U133A arrays. It includes 25 samples taken from 10 human subjects and 4 different tissues.a genome-wide phenomenon. 

The raw data for this study is available as experiment number GSE1397 in the Gene Expression Omnibus: http://www.ncbi.nlm.nih.gov/geo/.

Data and associated files for this tutorial can be downloaded

...

Importing Affymetrix CEL Files

Download the data from the Partek® site to your local disk. The zip file contains both data and annotation files.

...

Image Removed

Figure 1: Selecting the gene expression workflow

...

Image Removed

Figure 2: Selecting the folder and CEL files for the experiment

  • Select the Add File(s) > button to move all the .CEL files to the right panel. Twenty-five CEL files will be processed
  • Select the Next > button to open the Import Affymetrix CEL Files dialog (Figure 3)

...

Image Removed

 

Figure 3: Import Affymetrix CEL Files dialog

  • Select Customize… to open the Advanced Import Options dialog (Figure 4) 

...

Image Removed

Figure 4: Advanced Import Options

  • Select Library Files… to open the Specify File Locations dialog (Figure 5). This dialog is used to specify the location of the library folder and the annotation files 

 

PGS will automatically assign the annotation files according to the chip type stored in the .CEL files. If the annotation files are not available in the library directory, PGS will automatically download them and store them in the Default Library File Folder.

  • The default library location can be modified at by selecting Change... in the Default Library File Folder panel. By default, the library directory is at C:\Microarray Libraries. This directory is used to store all the external libraries and annotation files needed for analysis and visualization. The library directory can also be modified from Tools > File Manager in the main PGS menu
  • Select OK (Figure 5) to close the Specify File Locations dialog

...

Image Removed

Figure 5: 

  • Select the Outputs tab from the Advanced Import Options dialog (Figure 6)

...

Image Removed

Figure 6: 

  • In the Extract Time Stamp and Date from CEL File panel, make sure the Date button is selected to extract the chip scan date. This information can help you detect if there are batch effects caused by the process time
  • In the Quality Assess of Gene Expression panel, leave the QC report button unselected. A user guide for the microarray data quality assessment and quality control features is available in the user manual.
  • Select OK to exit the Advanced Import Options dialog
  • Select Import. The progress bar on the lower left of the Import Affymetrix CEL files dialog will update as .CEL files are imported. Once all files have been imported, the Import Affymetrix CEL Files dialog will close

 After importing the .CEL files has finished, the result file will open in PGS as a spreadsheet named 1 (Down_Syndrome-GE). The spreadsheet should contain 25 rows representing the micoarray chips (samples) and over 22,000 columns representing the probe sets (genes) (Figure 7). 

...

Image Removed

Figure 7: Viewing the main or top-level spreadsheet with imported .CEL files

For additional information on importing data into PGS, see Chapter 4 Importing and Exporting Data in the Partek User’s Manual. The User’s Manual is available from the Partek Genomic Suite menu from Help> User’s Manual. The FAQ (Help > On-line Tutorials > FAQ) may also be helpful. As this tutorial only addresses some topics, you may need to consult the User’s Manual for additional information about other useful features.

It is recommended that you are familiar with Chapter 6 The Pattern Visualization System of the user manual before going through the next section of the tutorial. 

Adding Sample Information

Twenty-five CEL files (samples) have been imported into PGS as shown in Figure 8. Sample information must be added to define the grouping and the goals of the experiment.

  • Select Add Sample Attributes in the Import section of the Gene Expression workflow panel
  • Choose the option Add Attributes from an Existing Column and select OK to open the Sample Information Creation dialog

 

In this tutorial, the file name (e.g., Down Syndrome-Astrocyte-748-Male-1-U133A.CEL) contains the information about a sample and is separated by hyphens (-). Choosing to split the file name by delimiters will separate the categories into different columns as shown in Figure 11.

  • In the Sample Information panel, specify the column labels (Labels 1-4) as Type, Tissue, Subject, and Gender, set each as categorical, and set the other columns as skip (see Figure 11). Select OK

...

Image Removed

Figure 8: Configuring the Sample Information Creation dialog

  • A dialog window asking if you would like to save the spreadsheet with the new sample attribute will appear. Select Yes
  • Make column 5. (Subject) random by right-clicking on the column header and selecting Properties from the pop-up menu (Figure 9). Select the Random Effect check box from the Properties dialog then select OK. The column 5. (Subject) will now be colored red, indicating that it is a random effect. 

...

Image Removed

Figure 9: Changing column properties

Note: More details on Random vs. Fixed Effects can be found later in this tutorial under the section Identifying Differentially Expressed Genes using the ANOVA.

Exploratory Data Analysis

At this point in analysis, you would explore the data preliminarily. Do the genes you expected to be differentially regulated appear to have larger or smaller intensity values?  Do similar samples resemble each other? 

The latter question can be explored using Principal Components Analysis (PCA), an excellent method for reducing and visualizing high-dimensional data.

  • Select Plot PCA Scatter Plot from the QA/AC section of the Gene Expression workflow. A Scatter Plot tab containing your PCA plot will open (Figure 10)

...

Image Removed

Figure 10: PCR Scatter Plot tab

In the scatter plot, each point represents a chip (sample) and corresponds to a row on the top-level spreadsheet. The color of the dot represents the type of the sample; red represents a normal sample and blue represents a Down syndrome sample. Points that are close together in the plot have similar intensity values across the probe sets on the whole chip (genome), and points that are far apart in the plot are dissimilar

...

As you can see from rotating the plot, there is no clear separation between Down syndrome and normal samples in this data since the red and blue samples are not separated in space. However, there are other factors that may separate the data.

  • In the Scatter Plot tab, select the Rendering Properties icon (Image Removed) and configure the plot as shown (Figure 11)
  • Color the points by column 4. Tissue and Size the points by column 3. Type
  • Select OK

...

Image Removed

Figure 11: Configuring the PCA scatter plot: Color by Tissue, size by Type

Notice now that the data are clustered by different tissues (Figure 12). 

...

Image Removed

Figure 12: Configured PCA scatter plot

  • Another way to see the cluster pattern is to put an ellipse around the Tissue groups. Select the Ellipsoids tab on the Plot Rendering Properties dialog
  • Select Add Ellipse/Ellipsoid
  • Select Ellipse in the Add Ellipse/Ellipsoid... dialog
  • Double click on Tissue in the Categorical Variable(s) panel to move it to the Grouping Variable(s) panel
  • Select OK to close the Add Ellipse/Ellipsoid... dialog and select OK again to exit the Plot Rendering Properties dialog

By rotating this PCA plot, you can see that the data is separated by tissues, and within some of the tissues, the Down syndrome samples and normal samples are separated. For example, in the Astrocyte and Heart tissues, the Down syndrome samples (small dots) are on the left, and the normal samples (large dots) are on the right (Figure 13).

...

Image Removed

Figure 13: PCA scatter plot with ellipses, rotated to show separation by Type 

PCA is an example of exploratory data analysis and is useful for identifying outliers and major effects in the data. From the scatter plot, you can see that the tissue is the biggest source of variation. There are many genes that express differently between the 4 tissues, but not as many genes that express differently between type (Down syndrome and normal) across the whole chip (genome).

...

Image Removed

Figure 14: Histogram tab 

The histogram plots one line for each of the samples with the intensity of the probes graphed on the X-axis and the frequency of the probe intensity on the Y-axis. This allows you to view the distribution of the intensities to identify any outliers. In this dataset, all the samples follow the same distribution pattern indicating that there are no obvious outliers in the data. As demonstrated with the PCA plot, if you click on any of the lines in the histogram, the corresponding row will be highlighted in the spreadsheet 1 (Down_Syndrome-GE). You can also change the way the histogram displays the data by clicking on the Plot Properties button. Explore these options on your own.

The decision to discard any samples would be based on information from the PCA plot, sample histogram plot, and QC metrics. To discard a sample and renormalize the data (without the effects of the outlier), start over with importing samples and omit the outlier sample(s) during the .CEL file import.

 

Identifying Differentially Expressed Genes using the ANOVA

Analysis of variance (ANOVA) is a very powerful technique for identifying differentially expressed genes in a multi-factor experiment such as this one. In this data set, the ANOVA will be used to generate a list of genes that are significantly different between Down syndrome and normal with an absolute difference bigger than 1.3 fold.

The ANOVA model should include Type since it is the primary factor of interest. From the exploratory analysis using the PCA plot, we observed that tissue is a large source of variation; therefore, tissue should be included in the model. In the experiment, multiple samples were taken from the same subject, so Subject must be included in the model. If Subject were excluded from the model, the ANOVA assumption that samples within groups are independent will be violated. Additionally, the PCA scatter plot showed that the Downs syndrome and normal separated within tissue type, so the Type*Tissue interaction should be included in the model.

...

using this link - Gene Expression Analysis tutorial data (right-click the link and choose "Save Link As" to download the tutorial data).


Additional assistance




Rate Macro
allowUsersfalse