PGS Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • To invoke the ANOVA dialog, select Detect Differentially Expressed Genes in the Analysis section of the Gene Expression workflow
  • In the Experimental Factor(s) panel, select Type, Tissue and Subject by pressing <Ctrl> and left clicking each factor
  • Use the Add Factor > button to move the selections to the ANOVA Factor(s) panel
  • To specify the interaction,select Type and Tissue by pressing <Ctrl> and left clicking each factor. Select the Add Interaction > button to add the Type * Tissue interaction to the ANOVA Factor(s) panel (Figure 1915). Do NOT select OK or Apply. We will be adding contrasts to this ANOVA in an upcoming section of the tutorial. 

Anchor
Figure 15
Figure 15

Image Added


Figure 15: ANOVA configuration 


Random vs. Fixed Effects – Mixed Model ANOVA

Most factors in ANOVA are fixed effects, whose levels in a data set represent all the levels of interest. In this study, Type and Tissue are fixed effects. If the levels of a factor in a data set only represent a random sample of all the levels of interest (for example, Subject), the factor is a random effect. The ten subjects in this study represent only a random sample of the global population about which inferences are being made. Random effects are colored red on the spreadsheet and in the ANOVA dialog. When the ANOVA model includes both random and fixed factors, it is a mixed-model ANOVA.

Another way to determine if a factor is random or fixed is to imagine repeating the experiment. Would the same levels of each factor be used again?

  • Type – Yes, the same types would be used again - a fixed effect
  • Tissue – Yes, the same tissues would be used again - a fixed effect
  • Subject - No, the samples would be taken from other subjects- a random effect

You can specify which factors are random and which are fixed when you import your data or after importing by right-clicking on the column corresponding to a categorical variable, selecting Properties, and checking Random effect. By doing that, the ANOVA will automatically know which factors to treat as random and which factors to treat as fixed.

 

Nested/Nesting Relationships

The subject factor in the ANOVA model is listed as “8. Subject (6. Type)” this means that Subject is nested in Type. PGS can automatically detect this sort of hierarchical design and will adjust the ANOVA calculation accordingly.

Linear Contrasts

By default, an ANOVA only outputs a p-value for each factor/interaction. To get the fold change and ratio between Down syndrome and normal samples, a contrast must be set-up.

  • Select Contrasts… to invoke the Configure dialog
  • Choose 6.Type from the Select Factor/Interaction drop-down list. The levels in this factor are listed on the Candidate Level(s) panel on the left side of the dialog (Figure 16)
  • Left click to select Down Syndrome from the Candidate Level(s) panel and move it to the Group 1 panel (renamed Down Syndrome) by selecting Add Contrast Level> in the top half of the dialog. Label 1 will be changed to the subgroup name automatically, but you can also manually specify the label name
  • Select Normal from the Candidate Level(s) panel and move it to the Group 2panel (renamed Normal)

Anchor
Figure 16
Figure 16

Image Added

Figure 16: Configuring contrasts for ANOVA

Because the data is log2 transformed, PGS will automatically detect this and will automatically select Yes in the Data is already log transformed? at the top right-hand corner. PGS will use the geometric mean of the samples in each group to calculate the fold change and mean ratio for the contrast between the Down syndrome and Normal samples.

  • Select Add Contrast to add the Down Syndrome vs. Normal contrast 
  • Select OK to apply the configuration
  • If successfully added, the Contrasts… button will now read Contrasts Included
  • By default, the Specify Output File is checked in Figure 19 and gives a name to the output file. If you are trying to determine which factors should be included in the model and you do not wish to save the output file, simply uncheck this box
  • Select OK in the ANOVA dialog to compute the 3-way mixed-model ANOVA
  • Several progress messages will display in the lower left-hand side of the ANOVA dialog while the results are being calculated.

The result will be displayed in a child spreadsheet, ANOVA-3way (ANOVAResults). In the child result spreadsheet, each row represents a gene, and the columns represent the computation results for that gene (Figure 17). By default, the genes are sorted in ascending order by the p-value of the first categorical factor. In this tutorial,Type is the first categorical factor, which means the most highly significant differently expressed gene between Down syndrome and normal samples is at the top of the spreadsheet in row 1.

Anchor
Figure 17
Figure 17

Image Added

Figure 17: ANOVA spreadsheet

For additional information about ANOVA in PGS, see Chapter 11 Inferential Statistics in the User’s Manual (Help > User’s Manual).

Viewing the Sources of Variation

Deciding which factors to include in the ANOVA may be an iterative process while you decide which factors and interactions are relevant as not all factors have to be included in the model. For example, in this tutorial, Gender and Scan date were not included.  The Sources of Variation plot is a way to quantify the relative contribution of each factor in the model towards explaining the variability of the data.

  • View the sources of variation for each of the factors across the whole genome by clicking Plot Sources of Variation from the Analysis section of the Gene Expression workflow with the ANOVA result spreadsheet active
  • A Sources of Variation tab will appear (Figure 18) with a bar chart showing the signal to noise ratio for each factor. Sources of variation can also be viewed as a pie chart showing sum or squares by selecting the Pie Chart (Sum of Squares) tab in the upper left-hand side of the Sources of Variation tab

Anchor
Figure 18
Figure 18

Image Added

Figure 18: Sources of Variation tab showing a bar chart

 

This plot presents the mean signal-to-noise ratio of all the genes on the microarray. All the factors in the ANOVA model are listed on the X-axis (including random error). The Y-axis represents the mean of the ratios of mean square of all the genes to the mean square error of all the genes. Mean square is ANOVA’s measure of variance. Compare each signal bar to the error bar; if a factor bar is higher than the error bar, that factor contributed significant variation to the data across all the variables. Notice, that this plot is very consistent with the results in the PCA scatter plot. In this data, on average, Tissue is the largest source of variation.

To view the source of variation for each individual gene, right click on a row header in the ANOVA-3way (ANOVAResults) spreadsheet and select the Sources of Variation item from the pop-up menu. This generates a Sources of Variation tab for the individual gene. View a few Sources of Variation plots from rows at the top of the ANOVA table and a few from the bottom of the table.

 Another useful graph is the ANOVA Interaction Plot which is also accessed by right-clicking on a row header in the ANOVA spreadsheet. Select ANOVA Interaction Plot from the options to generate an Interaction Plot tab for that individual gene. Generate these plots for rows 3 (DSCR3) and 8 (CSTB). If the lines in this plot are not parallel, then there is a chance there is an interaction between Tissue and Type. DSCR3 is a good example of this. We can look at the p-values in column 9, p-value(Type * Tissue) to check if this apparent interaction is statistically significant.  

 

Create Gene List

Now that you have obtained statistical results from the microarray experiment, you can now take the result of 22,283 genes and create a new spreadsheet of just those genes that pass certain criteria. This will streamline data management by focusing on just those genes with the most significant differential expression or substantial fold change. In PGS, the List Manager can be used to specify numerous conditions to use in the generation of our list of genes of interest. In this tutorial, we are going to create a gene list with a fold change between -1.3 to 1.3 with the significance FDR of 20%. The following section will illustrate how to use the List Manager to create this gene list.

  • Invoke the List Manager dialog by selecting Create Gene List in the Analysis section of the Gene Expression workflow
  • Ensure that the 1/ANOVA-3way (ANOVAResults) spreadsheet is selected as this is the spreadsheet we will be using to create our new gene list as shown in Figure 24
  • Select the ANOVA Streamlinedtab. In the Contrast: find genes that change between two categories panel, chooseDown Syndrome vs. Normal and select Have Any Change from the Setting dropdown menu list. This will find genes that have a fold change different between the types of samples
  • In the Configuration for “Down Syndrome vs Normal” panel, check that Include size of the changeis selected and enter 1.3 into Fold change >  and -1.3 in OR Fold change <
  • Select Include significance of the change, choose unadjusted p-value from the dropdown menu, and < 0.0005 for the cutoff. The number of genes that pass your cutoff criteria will be shown next to the # Pass field. In this example, 23 genes pass the criteria. 
  • Set Save the list as A, select Create, and then select Close to view the new gene list spreadsheet

Anchor
Figure 19
Figure 19

Image Added

Figure 19: Creating a gene list from ANOVA results

The spreadsheet Down_Syndrome_vs_Normal (A) will be created as a child spreadsheet under the Down_Syndrome-GE spreadsheet.

This gene list spreadsheet can now be used for further analysis such as hierarchical clustering, gene ontology, integration of copy number data, or exportation into other data analysis tools such as pathway analysis.

You should take some time creating new gene list criteria of your own to become familiar with the List Manager tool in PGS. For more information, you can always click on the (Image Added) buttons.

 

Hierarchical Clustering

The gene list in spreadsheet Down_Syndrome_vs_Normal (A) can now be used for hierarchical clustering to visualize patterns in the data.

  • Under the Visualization section in the Gene Expression workflow, select Cluster Based on Significant Genes
  • The Cluster Significant Genes dialog asks you to specify the type of clustering you want to perform. Select Hierarchical Clustering and select OK
  • Choose the Down_Syndrome_vs_Normal (A)  spreadsheet under the Spreadsheet with differentially expressed genes 
  • Choose the Standardize – shift genes to mean of zero and scale to standard deviation of one under the Expression normalization panel. This option will adjust all the gene intensities such that the mean is zero and the standard deviation is 1
  • Select OK to generate a Hierarchical Clustering tab (Figure 20)

Anchor
Figure 20
Figure 20

Image Added

Figure 20: Hierarchical Clustering results

The graph (Figure 20) illustrates the standardized gene expression level of each gene in each sample. Each gene is represented in one column, and each sample is represented in one row. Genes which are unchanged are have a value of zero and are colored black. Genes with increased expression have positive values and are colored red. Genes with reduced expression have negative values and are colored green. Down syndrome samples are colored red and normal samples are colored orange. On the left-hand side of the graph, we can see that the Down syndrome samples cluster together.

For more information on the methods used for clustering, you can refer to Chapter 8: Hierarchical & Partitioning Clustering in Help > User’s Manual. For a tutorial on configuring the clustering plot, please refer to the user guide that can be downloaded from: here or from Help >On-line Tutorials > User Guides.


Adding Gene Annotation

During data importation, the GeneChip annotation file was linked to the imported data. This linked annotation information can be added as new columns to the ANOVA or gene list spreadsheets. For example, we can add additional annotation to the gene list we created from the ANOVA results as follows:

  • In the Down_Syndrome_vs_Normal (A) spreadsheet, right click on the second column header 2. ProbesetID and select Insert Annotation from the pop-up menu (Figure 21)
  • Select Chromosomal Locationunder the Column Configuration panel. Leave everything else as default and select OK 

Anchor
Figure 21
Figure 21

Image Added

Figure 21: Adding a gene annotation 

Interestingly, of the 23 genes of the Down_Syndrome_vs_Normal (A) spreadsheet, 20 genes are located on chromosome 21.

  • To create a dot plot showing expression levels of a specific gene for each sample, right click on the row header and select Dot Plot (Orig. Data) from the pop-up menu. This generates a Dot Plot tab for the selected gene (Figure 22)

Anchor
Figure 22
Figure 22

Image Added

Figure 22: Dot plot results for gene Down syndrome critical region 3

In the plot, each dot is a sample of the original data. The Y-axis represents the log2 normalized intensity of the gene and the X-axis represents the different types of samples. The median expression of each group is different from each other in this example. The median of the Down syndrome samples is ~6.3, but the median of the normal samples is ~6.0. The line inside the Box & Whiskers represents the median of the samples in a group. Placing the mouse cursor over a Box & Whiskers plot will show its median and range. 

 

Generating Gene Lists from a Volcano Plot

Next, we will generate a list of genes that passed a p-value threshold of 0.05 and fold-changes greater than 1.3 using a volcano plot.

  • Ensure that the 1/ANOVA-3way (ANOVAResults) spreadsheet in the Analysis tab is selected as this is the spreadsheet we will be using to create the gene list
  • Select View > Volcano Plot from the PGS main menu
  • Set X Axis (Fold-Change) to 12. Fold-Change(Down Syndrome vs. Normal), and the Y axis (p-value) to be 110. p-value(Down Syndrome vs. Normal)
  • Select OK to generate a Volcano Plot tab and for genes in the spreadsheet (Figure 23)

Anchor
Figure 23
Figure 23

Image Added

Figure 23: Volcano Plot results for Down syndrome vs. Normal contrast

In the plot, each dot represents a gene. The X-axis represents the fold change of the contrast, and the Y-axis represents the range of p-values. The genes with increased expression in Down syndrome samples on the right side; genes with reduced expression in Down syndrome samples are on the left of the N/C line. The genes become more statistically significant with increasing Y-axis position. The genes that have larger and more significant changes between the Down syndrome and normal groups are on the upper right and upper left corner (Figure 32). 

In order to select the genes by fold-change and p-value, we will draw a horizontal line to represent the p-value 0.05 and two vertical lines indicating the –1.3 and 1.3-fold changes (cutoff lines).

  • Select Rendering Properties (Image Added)
  • Choose the Axes tab
  • Select the Set Cutoff Lines button and configure the Set Cutoff Lines dialog as shown (Figure 23)
  • Check Select all points in a section to allow PGS to automatically select all the points in any given section
  • Select OK to draw the cutoff lines
  • Select OK in the Plot Rendering Properties dialog to close the dialog and view the plot

Anchor
Figure 23
Figure 23

Image Added

Figure 23: Setting cutoff lines for -1.3 to 1.3 fold changes and p value of 0.05

The plot will be divided into six sections. By clicking on the upper-right section, all genes in that section will be selected (Figure 24).

Anchor
Figure 24
Figure 24

 

 

  • Right-click on the selected region in the plot and choose Create List to create a list including the genes from the section selected. Note that these p-values are uncorrected.

Note: If no column is selected in the parent (ANOVA) spreadsheet, all of the columns will be included in the gene list; if some columns are selected, only the selected columns will be included in the list.

  • Specify a name for the gene list and write a brief description about the list (Figure 35). The description is shown when you right-click on the spreadsheet > Info > Comments

 The list can be saved as a text file (File > Save As Text File) for use in reports or by downstream analysis software.

 

End of Tutorial

This is the end of tutorial. If you need additional assistance with this data set, email us at support@partek.com or contact the Partek Technical Support staff at:

North America

(9:00 a.m. - 5:00 p.m. CST)
+1-314-884-6172

Europe
(9:00 a.m. - 5:00 p.m. GMT)
+44 2071 930426 or +1.314.884.6173

Asia/Australasia
(9:00 a.m. - 6:00 p.m. SGT)
+65 6808 8706

 

Rate Macro
allowUsersfalse