PGS Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Analysis of variance (ANOVA) is a very powerful technique for identifying differentially expressed genes in a multi-factor experiment such as this one. In this data set, the ANOVA will be used to generate a list of genes that are significantly different between Down syndrome and normal with an absolute difference bigger than 1.3 fold.

The ANOVA model should include Type since  because it is the primary factor of interest. From the exploratory analysis using the PCA plot, we observed that tissue is a large source of variation; therefore, tissue Tissue should be included in the model. In the experiment, multiple samples were taken from the same subject, so Subject must be included in the model. If Subject were excluded from the model, the ANOVA assumption that samples within groups are independent will be violated. Additionally, the PCA scatter plot showed that the Downs syndrome and normal separated within tissue type, so the Type*Tissue interaction should be included in the model.

  • To invoke the ANOVA dialog, select Detect Differentially Expressed Genes in the Analysis section of the Gene Expression workflow
  • In the Experimental Factor(s) panel, select Type, Tissue and Subject by pressing <Ctrl> and left clicking each factor
  • Use the Add Factor > button to move the selections to the ANOVA Factor(s) panel
  • To specify the interaction, select Type and Tissue by pressing <Ctrl> and left clicking each factor. Select the Add Interaction > button to add the Type * Tissue interaction to the ANOVA Factor(s) panel (Figure 1). Do NOT select OK or Apply. We will be adding contrasts to this ANOVA model in an upcoming section of the tutorial. 

...

Another way to determine if a factor is random or fixed is to imagine repeating the experiment. Would the same levels of each factor be used again?

  • Type – Yes, the same types would be used again - a fixed effect
  • Tissue – Yes, the same tissues would be used again - a fixed effect
  • Subject - No, the samples would be taken from other subjects- a random effect

...

The subject factor in the ANOVA model is listed as “8“5. Subject (63. Type)” this , which means that Subject is nested in Type. Partek Genomics Suite can automatically detect this sort of hierarchical design and will adjust the ANOVA calculation accordingly.

...

By default, an ANOVA only outputs a p-value for each factor/interaction. To get the fold change and ratio between Down syndrome and normal samples, a contrast must be set - up.

  • Select Contrasts… to invoke the Configure dialog
  • Choose 6.Type from the Select Factor/Interaction drop-down list. The levels in this factor are listed on the Candidate Level(s) panel on the left side of the dialog
  • Left click to select Down Syndrome from the Candidate Level(s) panel and move it to the Group 1 panel (renamed Down Syndrome) by selecting Add Contrast Level > in the top half of the dialog. Label 1 will be changed to the subgroup name automatically, but you can also manually specify the label name 
  • Select Normal from the Candidate Level(s) panel and move it to the Group 2 panel (renamed Normal) 
  • The Add Contrast button can now be selected (Figure 2)

Numbered figure captions
SubtitleTextAdding a contrast of Down Syndrome and Normal samples
AnchorNameConfiguring ANOVA Contrasts

Because the data is log2 transformed, Partek Genomics Suite will automatically detect this and will automatically select Yes in the Yes for Data is already log transformed? in the top right-hand corner of the dialog. Partek Genomics Suite will use the geometric mean of the samples in each group to calculate the fold change and mean ratio for the contrast between the Down syndrome and Normal samplesthe Down syndrome and normal samples.

  • Select Add Contrast to add the Down Syndrome vs. Normal contrast 
  • Select OK to apply the configuration
  • If successfully added, the Contrasts… button will now read Contrasts Included (Figure 3)

Numbered figure captions
SubtitleTextANOVA configuration with contrasts included
AnchorNameANOVA Configuration with Contrasts

  • By default, the  Specify Output File is checked and gives a name to the output file. If you are trying to determine which factors should be included in the model and you do not wish to save the output file, simply uncheck this box
  • Select OK in the ANOVA dialog to compute the 3-way mixed-model ANOVA
  • Several progress messages will display in the lower left-hand side of the ANOVA dialog while the results are being calculated.

The result will be displayed in a child spreadsheet, ANOVA-3way (ANOVAResults). In the child result this spreadsheet, each row represents a gene , and the columns represent the computation results for that gene (Figure 4). By default, the genes are sorted in ascending order by the p-value of the first categorical factor. In this tutorial,Type is the first categorical factor, which means the most highly significant differently expressed gene between Down syndrome and normal samples is at the top of the spreadsheet in row 1.

...

Numbered figure captions
SubtitleTextSources of Variation tab showing a bar chart
AnchorNameANOVA Sources of Variation

This plot presents the mean signal-to-noise ratio of all the genes on the microarray. All the non-random factors in the ANOVA model are listed on the X-axis (including error). The Y-axis represents the mean of the ratios of mean square of all the genes to the mean square error of all the genes. Mean square is ANOVA’s measure of variance. Compare the bar for each signal bar to the bar for error bar; if a factor's bar is higher than the error's bar, that factor contributed significant variation to the data across all the variables. Notice , that this plot is very consistent with the results in the PCA scatter plot. In this data, on average, Tissue is the largest source of variation.

To view the source of variation for each individual gene, right click on a row header in the ANOVA-3way (ANOVAResults) spreadsheet and select the select Sources of Variation item from the pop-up menu. This generates a Sources of Variation tab for the individual gene. View a few Sources of Variation plots from rows at the top of the ANOVA table and a few from the bottom of the table.

 Another useful graph is the an ANOVA Interaction Plot.

  • Right-click on a row header in the ANOVA spreadsheet (Figure 6)
  • Select ANOVA Interaction Plot from the options to  to generate an Interaction Plot tab for that individual gene

Numbered figure captions
SubtitleTextCalling an ANOVA Interaction Plot for a gene
AnchorNameCalling ANOVA Interaction Plot

Generate these plots for rows 3 (DSCR3) and 8 (CSTB). If the lines in this the interaction plot are not parallel, then there is a chance that there is an interaction between Tissue and Type. DSCR3 is a good example of this (Figure 7). We can look at the p-values in column 9, p-value(Type * Tissue) to check if this apparent interaction is statistically significant. 

 

Numbered figure captions
SubtitleTextInteraction Plot for DSCR3
AnchorNameInteraction Plot

 

We can view the expression levels for of a gene for each sample using a Dot Plotdot plot.

  • Right click on the gene row header and select Dot Plot (Orig. Data) from the pop-up menu. This generates a Dot Plot tab for the selected gene (Figure 8)

...

Now that you have obtained statistical results from the microarray experiment, you can now take the result of 22,283 genes and create a new spreadsheet of spreadsheets containing just those genes that pass certain criteria. This will streamline data management by focusing on just those genes with the most significant differential expression or substantial fold change. The List Manager can be used to specify numerous conditions to use in the generation of our list of for selecting genes of interest. In this tutorial, we are going to create a gene list of gene with a fold change between -1.3 to 1.3 with that has an unadjusted p-value of < 0.0005. 

  • Invoke the List Manager dialog by selecting Create Gene List in the Analysis section of the Gene Expression workflow
  • Ensure that the 1/ANOVA-3way (ANOVAResults) spreadsheet is selected as this is the spreadsheet we will be using to create our new gene list as shown (Figure 9)
  • Select the ANOVA Streamlined tab. In the Contrast: find genes that change between two categories panel, choose Down Syndrome vs. Normal and select Have Any Change from the Setting dropdown menu listThis will find genes that have a fold change different between the with different expression levels in the different types of samples
  • In the Configuration for “Down Syndrome vs Normal” panel, check that Include size of the change is selected and enter 1.3 into Fold change >  and -1.3 in OR Fold change <
  • Select Include significance of the change, choose unadjusted p-value from the dropdown menu, and < 0.0005 for the cutoff. The number of genes that pass your cutoff criteria will be shown next to the # Pass field. In this example, 23 genes pass the criteria. 
  • Set Save the list as A, select Create, and then select Close to view the new gene list spreadsheet

...

This gene list spreadsheet can now be used for further analysis such as hierarchical clustering, gene ontology, integration of copy number data, or exportation be exported into other data analysis tools such as pathway analysis.

You should take some time can practice creating new gene list criteria of your own to become familiar with the List Manager tool. For more information, you can always click on the () buttons.

...

  • Select the 1/ANOVA-3way (ANOVAResults) spreadsheet in the Analysis tab. Thisis This is the spreadsheet we our gene list will be using to create the gene listdrawn from
  • Select View > Volcano Plot from the Partek Genomics Suite main menu (Figure 10)

...

  • Set X Axis (Fold-Change) to 12. Fold-Change(Down Syndrome vs. Normal), and the Y axis (p-value) to be 110. p-value(Down Syndrome vs. Normal)
  • Select OK to generate a Volcano Plot tab and for genes in the ANOVA spreadsheet (Figure 11)

Numbered figure captions
SubtitleTextVolcano plot generated from ANOVA spreadsheet
AnchorNameVolcano Plot

In the plot, each dot represents a gene. The X-axis represents the fold change of the contrast (Down syndrome vs. Normal), and the Y-axis represents the range of p-values. The genes with increased expression in Down syndrome samples are on the right side of the N/C (no change) line; genes with reduced expression in Down syndrome samples are on the left of the N/C line. The genes become more statistically significant with increasing Y-axis position. The genes that have larger and more significant changes between the Down syndrome and normal groups are on the upper right and upper left corner. 

...