PGS Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Select Import Samples under the Import section of the workflow
  • Select Import from Affymetrix CEL Files and then click OK
  • Select the Browse button to select the C:\Partek Training Data\Down_Syndrome-GE folder. By default, all the files with a .CEL extension are selected(Figure 2)

Anchor
Figure 2
Figure 2

Image ModifiedImage Added

Figure 2: Selecting the folder and CEL files for the experiment

  • Select the Add File(s) > button to move all the .CEL files to the right panel. Twenty-five CEL files will be processed
  • Select the Next > button to open the Import Affymetrix CEL Files dialog (Figure 3)

Anchor
Figure 3
Figure 3

 

Figure 3: Import Affymetrix CEL Files dialog

  • Select Customize… to open the Advanced Import Options dialog (Figure 4

Anchor
Figure 4
Figure 4

Image Modified

Figure 4: Advanced Import Options

...

  • The default library location can be modified at by selecting Change... in the Default Library File Folder panel. By default, the library directory is at C:\Microarray Libraries. This directory is used to store all the external libraries and annotation files needed for analysis and visualization. The library directory can also be modified from Tools > File Manager in the main PGS menu
  • Select OK (Figure 5) to close the Specify File Locations dialog

Anchor
Figure 5
Figure 5

Image Modified

Figure 5: 

  • Select the Outputs tab from the Advanced Import Options dialog (Figure 6)

Anchor
Figure 6
Figure 6

Image Modified

Figure 6: 

  • In the Extract Time Stamp and Date from CEL File panel, make sure the Date button is selected to extract the chip scan date. This information can help you detect if there are batch effects caused by the process time
  • In the Quality Assess of Gene Expression panel, leave the QC report button unselected. A user guide for the microarray data quality assessment and quality control features is available in the user manual.
  • Select OK to exit the Advanced Import Options dialog
  • Select Import. The progress bar on the lower left of the Import Affymetrix CEL files dialog will update as .CEL files are imported. Once all files have been imported, the Import Affymetrix CEL Files dialog will close

 After importing the .CEL files has finished, the result file will open in PGS as a spreadsheet named 1 (Down_Syndrome-GE). The spreadsheet should contain 25 rows representing the micoarray chips (samples) and over 22,000 columns representing the probe sets (genes) (Figure 7). 

Anchor
Figure 7
Figure 7

Image Modified

Figure 7: Viewing the main or top-level spreadsheet with imported .CEL files

...

  • In the Sample Information panel, specify the column labels (Labels 1-4) as Type, Tissue, Subject, and Gender, set each as categorical, and set the other columns as skip (Figure 8). Select OK

Anchor
Figure 8
Figure 8

Image ModifiedImage Added

Figure 8: Configuring the Sample Information Creation dialog

  • A dialog window asking if you would like to save the spreadsheet with the new sample attribute will appear. Select Yes
  • Make column 5. (Subject) random by right-clicking on the column header and selecting Properties from the pop-up menu (Figure 9). Select the Random Effect check box from the Properties dialog then select OK. The column 5. (Subject) will now be colored red, indicating that it is a random effect. 
  • To save changes to the spreadsheet, select the Save Active Spreadsheet icon (). Spreadsheets with unsaved changes have an asterisk next to their name in the spreadsheet tree. 

Anchor
Figure 9
Figure 9

Image ModifiedImage AddedImage AddedImage Added

Figure 9: Changing column properties

...

  • Select Plot PCA Scatter Plot from the QA/AC section of the Gene Expression workflow. A Scatter Plot tab containing your PCA plot will open (Figure 10)

Anchor
Figure 10
Figure 10

Image Modified

Figure 10: PCA Scatter Plot tab

...

  • In the Scatter Plot tab, select the Rendering Properties icon () and configure the plot as shown (Figure 11)
  • Color the points by column 4. Tissue and Size the points by column 3. Type
  • Select OK

Anchor
Figure 11
Figure 11

Image ModifiedImage Added

Figure 11: Configuring the PCA scatter plot: Color by Tissue, size by Type

...

Figure 12: Configured PCA scatter plot

Another way to see the cluster pattern is to put an ellipse around the Tissue groups.

  • Open the the Plot Rendering Properties dialog and select the Ellipsoids tabthe Ellipsoids tab (Figure 13)
  • Select Add Ellipse/Ellipsoid
  • Select Ellipse in the Add Ellipse/Ellipsoid... dialog
  • Double click on Tissue in the Categorical Variable(s) panel to move it to the Grouping Variable(s) panel
  • Select OK to close the Add Ellipse/Ellipsoid... dialog and select OK again to exit the Plot Rendering Properties dialog

Anchor
Figure 13
Figure 13

Image AddedImage AddedImage Added

Figure 13: Adding Ellipses to PCA Scatter Plot 

By rotating this PCA plot, you can see that the data is separated by tissues, and within some of the tissues, the Down syndrome samples and normal samples are separated. For example, in the Astrocyte and Heart tissues, the Down syndrome samples (small dots) are on the left, and the normal samples (large dots) are on the right (Figure 1314).

Anchor
Figure

...

14
Figure

...

14

 

Image Modified

Figure 1314: PCA scatter plot with ellipses, rotated to show separation by Type 

...

The next step is to draw a histogram to examine the samples. Select Plot Sample Histogram in the QA/QC section of the Gene Expression workflowto generate the Histogram tab (Figure 1415).

 

Anchor
Figure
14
15
Figure
14
15

Image Modified

Figure 1415: Histogram tab 

The histogram plots one line for each of the samples with the intensity of the probes graphed on the X-axis and the frequency of the probe intensity on the Y-axis. This allows you to view the distribution of the intensities to identify any outliers. In this dataset, all the samples follow the same distribution pattern indicating that there are no obvious outliers in the data. As demonstrated with the PCA plot, if you click on any of the lines in the histogram, the corresponding row will be highlighted in the spreadsheet 1 (Down_Syndrome-GE). You can also change the way the histogram displays the data by clicking on the Plot Properties button. Explore these options on your own.

...

  • To invoke the ANOVA dialog, select Detect Differentially Expressed Genes in the Analysis section of the Gene Expression workflow
  • In the Experimental Factor(s) panel, select Type, Tissue and Subject by pressing <Ctrl> and left clicking each factor
  • Use the Add Factor > button to move the selections to the ANOVA Factor(s) panel
  • To specify the interaction,select Type and Tissue by pressing <Ctrl> and left clicking each factor. Select the Add Interaction > button to add the Type * Tissue interaction to the ANOVA Factor(s) panel (Figure 1516). Do NOT select OK or Apply. We will be adding contrasts to this ANOVA in an upcoming section of the tutorial. 

Anchor
Figure 1516
Figure 1516

Image ModifiedImage AddedImage Added

Figure 1516: ANOVA configuration 


Random vs. Fixed Effects – Mixed Model ANOVA

...

  • Select Contrasts… to invoke the Configure dialog
  • Choose 6.Type from the Select Factor/Interaction drop-down list. The levels in this factor are listed on the Candidate Level(s) panel on the left side of the dialog (Figure 16)
  • Left click to select Down Syndrome from the Candidate Level(s) panel and move it to the Group 1 panel (renamed Down Syndrome) by selecting Add Contrast Level> in the top half of the dialog. Label 1 will be changed to the subgroup name automatically, but you can also manually specify the label name
  • Select Normal from the Candidate Level(s) panel and move it to the Group 2panel (renamed Normal)

Anchor
Figure 1617
Figure 1617

Image ModifiedImage AddedImage AddedImage Added

Figure 1617: Configuring contrasts for ANOVA

...

  • Select Add Contrast to add the Down Syndrome vs. Normal contrast 
  • Select OK to apply the configuration
  • If successfully added, the Contrasts… button will now read Contrasts Included
  • By default, the Specify Output File is checked (Figure 1517) and gives a name to the output file. If you are trying to determine which factors should be included in the model and you do not wish to save the output file, simply uncheck this box
  • Select OK in the ANOVA dialog to compute the 3-way mixed-model ANOVA
  • Several progress messages will display in the lower left-hand side of the ANOVA dialog while the results are being calculated.

The result will be displayed in a child spreadsheet, ANOVA-3way (ANOVAResults). In the child result spreadsheet, each row represents a gene, and the columns represent the computation results for that gene (Figure 1718). By default, the genes are sorted in ascending order by the p-value of the first categorical factor. In this tutorial,Type is the first categorical factor, which means the most highly significant differently expressed gene between Down syndrome and normal samples is at the top of the spreadsheet in row 1.

 

Anchor
Figure 1718
Figure 1718

Image Modified

Figure 1718: ANOVA spreadsheet

For additional information about ANOVA in PGS, see Chapter 11 Inferential Statistics in the User’s Manual (Help > User’s Manual).

...

  • View the sources of variation for each of the factors across the whole genome by clicking Plot Sources of Variation from the Analysis section of the Gene Expression workflow with the ANOVA result spreadsheet active
  • A Sources of Variation tab will appear (Figure 18) with a bar chart showing the signal to noise ratio for each factor. Sources of variation can also be viewed as a pie chart showing sum or squares by selecting the Pie Chart (Sum of Squares) tab in the upper left-hand side of the Sources of Variation tab

Anchor
Figure 1819
Figure 1819

 

Image Modified

Figure 1819: Sources of Variation tab showing a bar chart

...

 Another useful graph is the ANOVA Interaction Plot which is also accessed by right-clicking .

  • Right-click on a row header in the ANOVA spreadsheet

...

  • (Figure 20)
  • Select ANOVA Interaction Plot from the options to generate an Interaction Plot tab for that individual gene

...

 

Anchor
Figure 20
Figure 20

Image AddedImage Added

Figure 20: Creating an ANOVA Interaction plot

Generate these plots for rows 3 (DSCR3) and 8 (CSTB). If the lines in this plot are not parallel, then there is a chance there is an interaction

...

between Tissue

...

 and Type. DSCR3 is a good example of this. We can look at the p-values in column 9, p-value(Type * Tissue) to check if this apparent interaction is statistically significant. 

 

Create Gene List

Now that you have obtained statistical results from the microarray experiment, you can now take the result of 22,283 genes and create a new spreadsheet of just those genes that pass certain criteria. This will streamline data management by focusing on just those genes with the most significant differential expression or substantial fold change. In PGS, the List Manager can be used to specify numerous conditions to use in the generation of our list of genes of interest. In this tutorial, we are going to create a gene list with a fold change between -1.3 to 1.3 with the significance FDR of 20%. The following section will illustrate how to use the List Manager to create this gene list.

  • Invoke the List Manager dialog by selecting Create Gene List in the Analysis section of the Gene Expression workflow
  • Ensure that the 1/ANOVA-3way (ANOVAResults) spreadsheet is selected as this is the spreadsheet we will be using to create our new gene list as shown (Figure 1921)
  • Select the ANOVA Streamlinedtab. In the Contrast: find genes that change between two categories panel, chooseDown Syndrome vs. Normal and select Have Any Change from the Setting dropdown menu list. This will find genes that have a fold change different between the types of samples
  • In the Configuration for “Down Syndrome vs Normal” panel, check that Include size of the changeis selected and enter 1.3 into Fold change >  and -1.3 in OR Fold change <
  • Select Include significance of the change, choose unadjusted p-value from the dropdown menu, and < 0.0005 for the cutoff. The number of genes that pass your cutoff criteria will be shown next to the # Pass field. In this example, 22 genes pass the criteria. 
  • Set Save the list as A, select Create, and then select Close to view the new gene list spreadsheet

Anchor
Figure
19
21
Figure
19
21

Image ModifiedImage Added

Figure 1921: Creating a gene list from ANOVA results

...

  • Under the Visualization section in the Gene Expression workflow, select Cluster Based on Significant Genes
  • The Cluster Significant Genes dialog asks you to specify the type of clustering you want to perform. Select Hierarchical Clustering and select OK
  • Choose the Down_Syndrome_vs_Normal (A)  spreadsheet under the Spreadsheet with differentially expressed genes 
  • Choose the Standardize – shift genes to mean of zero and scale to standard deviation of one under the Expression normalization panel. This option will adjust all the gene intensities such that the mean is zero and the standard deviation is 1
  • Select OK to generate a Hierarchical Clustering tab (Figure 2022)

Anchor
Figure 2022
Figure 2022

Image Modified

Figure 2022: Hierarchical Clustering results

The graph (Figure 2022) illustrates the standardized gene expression level of each gene in each sample. Each gene is represented in one column, and each sample is represented in one row. Genes which are unchanged are have a value of zero and are colored black. Genes with increased expression have positive values and are colored red. Genes with reduced expression have negative values and are colored green. Down syndrome samples are colored red and normal samples are colored orange. On the left-hand side of the graph, we can see that the Down syndrome samples cluster together.

...

  • In the Down_Syndrome_vs_Normal (A) spreadsheet, right click on the second column header 2. ProbesetID and select Insert Annotation from the pop-up menu (Figure 2123)
  • Select Chromosomal Locationunder the Column Configuration panel. Leave everything else as default and select OK 

Anchor
Figure
21
23
Figure
21
23

 

 

Image ModifiedImage AddedImage Added

 

Figure 2123: Adding a gene annotation 

...

  • To create a dot plot showing expression levels of a specific gene for each sample, right click on the row header and select Dot Plot (Orig. Data) from the pop-up menu. This generates a Dot Plot tab for the selected gene (Figure 2224)

Anchor
Figure 2224
Figure 2224

Image ModifiedImage Added

Figure 2224: Dot plot results for gene Down syndrome critical region 3

...

  • Select the 1/ANOVA-3way (ANOVAResults) spreadsheet in the Analysis tab. This is the spreadsheet we will be using to create the gene list
  • Select View > Volcano Plot from the PGS main menu
  • Set X Axis (Fold-Change) to 12. Fold-Change(Down Syndrome vs. Normal), and the Y axis (p-value) to be 110. p-value(Down Syndrome vs. Normal)
  • Select OK to generate a Volcano Plot tab and for genes in the spreadsheet (Figure 23)

Anchor
Figure 2325
Figure 2325

Image ModifiedImage Added

Figure 2325: Volcano Plot results for Down syndrome vs. Normal contrast

In the plot, each dot represents a gene. The X-axis represents the fold change of the contrast, and the Y-axis represents the range of p-values. The genes with increased expression in Down syndrome samples on the right side; genes with reduced expression in Down syndrome samples are on the left of the N/C line. The genes become more statistically significant with increasing Y-axis position. The genes that have larger and more significant changes between the Down syndrome and normal groups are on the upper right and upper left corner (Figure 2325). 

In order to select the genes by fold-change and p-value, we will draw a horizontal line to represent the p-value 0.05 and two vertical lines indicating the –1.3 and 1.3-fold changes (cutoff lines).

  • Select Rendering Properties ()
  • Choose the Axes tab
  • Select the Set Cutoff Lines button and configure the Set Cutoff Lines dialog as shown (Figure 24)
  • Check Select all points in a section to allow PGS to automatically select all the points in any given section
  • Select OK to draw the cutoff lines
  • Select OK in the Plot Rendering Properties dialog to close the dialog and view the plot

Anchor
Figure 2426
Figure 2426

 

Image ModifiedImage Added

Figure 24: Setting cutoff lines for -1.3 to 1.3 fold changes and p value of 0.05

The plot will be divided into six sections. By clicking on the upper-right section, all genes in that section will be selected (Figure 2527). AnchorFigure 25Figure 25

Image Removed

Figure 25: Creating a gene list from a Volcano Plot

  • Right-click on the selected region in the plot and choose choose Create List List to create a list including the genes from the section selected. Note that these p-values are uncorrected

...

  • Specify a name for the gene list and write a brief description about the list (Figure 2527). The description is shown when you right-click on the spreadsheet > Info > Comments

 The list can be saved as a text file (File > Save As Text File) for use in reports or by downstream analysis software.

 

Anchor
Figure 27
Figure 27

Image AddedImage AddedImage Added

Figure 27: Creating a gene list from a Volcano Plot


End of Tutorial

This is the end of tutorial. If you need additional assistance with this data set, email us at support@partek.com or contact the Partek Technical Support staff at:

...