PGS Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If you have a region list or a bed file and you have a microarray experiment with data, you can summarize the data according to the genomic coordinates contained in the region list. For instance, the region list contains a list of CpG islands, the experiment contains methylation percentage values for probes (β values), and you would like to summarize the methylation values for individual probes for the CpG islands. Or you have a list of copy number amplifications, microarray gene expression data, and you are interested in determining if the average intensities of the probes in those regions is higher than expected.

  • Import the region list (or BED file) and specify the region property as explained elsewhere in this document

  • With the region list spreadsheet selected, right-click in any column header and select Insert Average
  • A dialog box similar to that shown. With the Add Average tab selected, specify the location where you like the new columns to appear by using the Add to the and of Column pull-down menus. Specify the top-level spreadsheet containing the data you wish to be averaged (β values, gene-intensity values, etc.) in the Get average from spreadsheet pull-down menu. Choose the radio button to specify how the averaging should be done. The bottom two choices (Mean of all samples and Mean value for all samples separately) are obvious; the first option (Mean of samples significant in region) is used when the region list has a SampleID associated with each region. In this case, the column designated as the SampleID column from the top-level spreadsheet will be used to identify the sample to be summarized for each region.

Find region overlaps

You have a list of regions from another analysis program (perhaps you detected peaks using an R program) and you’d like to compare that region list with a region list that Genomics Suite calculated. Perhaps you have two lists created by Genomics Suite (one generated from peak detection with one set of parameters and the other created with different parameters) and you’d like to see what the two lists have in common. You may use the Tools > Find Region Overlaps command to compare two or more region lists as shown.

There are two separate modes of operation for this command: Report all regions and Only report regions present in all list. The first option, Report all regions, will report all regions in both lists. If there is any region overlap between the lists, the intersection of the regions will be reported along with the start and stop coordinates of the intersection, the percent overlap between the intersected region with each of the regions in the input lists. If a region is found in only one list, it will be reported as well.

In contrast, the second option, Only report regions present in all lists, will intersect both lists and only reports regions found in all the lists. 

Importing genomic locations to be used with annotating SNVs

The Tools > Annotate SNVs feature requires four columns of data per genomic location: the position of the SNP (chr.basePosition), the SampleName, a reference base, and the SNP call (single nucleotide or genotype) as shown in Figure 20. 

  • Prepare input list as shown in Figure 20 and save as either a tab-separated or comma separated
    file

  • Use File > Import > Text to import the table. During import, change the data type of column 1 (as in Figure 3) to text by right-clicking on the color bar of column one and changing the data type to text. You may leave the other columns as categorical response types

  • The correct properties must be set for this spreadsheet. Right-click on the newly imported spreadsheet in the navigator and select Properties

  • Choose Other in the Configure Spreadsheet dialog (Figure 6)
  • Make sure Genomic is selected in the Add Property pull-down menu and select Add In the next dialog box, select Genomic location instead of marker IDs in the Choose the type of genomic data. The Marker ID in column should be set to the first column. [If Marker ID in column does not contain any items in the pull-down list, it is likely that the first column was not a text column (drawn in gray) during import. If this happens, then right-click in the column header in the spreadsheet and change Type: to text.]
  • Specify the Species from a pull-down menu selection or by typing in the species name. Select Edit Genome to specify the Species Name, Genome Version, Cytoband file, and 2Bit sequence file. The last two fields are optional. Select OK

Now that the properties have been set appropriately, Tools > Annotate SNVs may be invoked on this
spreadsheet.

Importing a BED file

A BED (Browser Extensible Data) file is a special case of a region list: it is a tab-delimited text file and the first three columns of BED files contain the chromosome, start, and stop locations. To import a bed file to be used as a data region list, follow the import instructions for region lists. A BED File might also be visualized as an annotation file containing regions in the Genome Browser. 

Using a BED file as an annotation source for the genome browser

BED files do not contain individual sequences nor do the regions have names. For instance, the UCSC table browser has a BED file that contains reads from a long non-coding RNA-Seq experiment and you might like to view this information in the context of your dataset. Before you could visualize a BED file in the chromosome viewer, you would have to create a Partek annotation file from the BED file.

  • From the top command menu, select Tools > Annotation Manager
  • In the My Annotations tab, select Create Annotation
  • Select BED file (.bed) under Choose Annotation Type
  • Under File Locations, specify the Source (input BED file) with the Browse menu item.  You may also specify the Result file name (of the annotation file) and location with the Browse button. You might consider saving the result file to your Microarray Libraries folder

  • In the Annotation Details section of the dialog box, specify the Name of the annotation database that will be visible from within Genomics Suite, Species, and Genome Build. Preview Chromosome Names would be used if the chromosome names in the annotation file must be changed to match the name of the chromosome in the genomic annotations. 

  • Select OK

Visualizing a BED file as an annotation track in the genome browser

In order to use a BED file as an Annotation track in the Genome Browser, first create the annotation file as described above, being careful to have specified the species and genome build appropriately.

  • Invoke the Genome Browser by right-clicking in any spreadsheet that has genomic features on rows (gene lists, ANOVA results, SNP detection) and select either Browse to Row or Browse to Location

  • In the Track Toolbar on the left, select New Track which invokes the dialog shown in Figure 21

  • Select Add an annotation track with genomic features from a selected annotation source and select Next

  • Next you will have to choose the annotation file that was created from the BED file. The  procedure for doing this might vary slightly depending on the type of spreadsheet you have displayed in the Genome Browser. You may be shown a list of annotations that includes the annotation source you have created; in this case, select the radio button in the Available Annotations panel. If, however,

  • At the bottom of the screen, you should either check or uncheck the box next to Separate strands (checking means that the BED file contained the strand information for each region and you wish to visualize the regions on different strands). Unselecting Separate strands should be used if the BED file did not contain strand information or if you do not wish to display the information on separate strands. Select Create

GO ANOVA, GSEA/GeneSet ANOVA, and Pathway ANOVA

As these features require intensity (or count) data as well as experimental groups, these features cannot be performed on an imported lists.

Integrating Imported Data

If the data from imported spreadsheets has been associated with annotations, several integration approaches may be used to integrate multiple kinds of imported data. For instance, the Genome Browser may be used to display data from multiple spreadsheets/experiments regardless of the type of spreadsheets (imported data or microarray or NGS experiments). The Venn Diagram tool may be used to find overlaps based on a feature name. Tools > Find Overlapping Regions can use an imported gene list and a list of regions from a copy number or ChIP-Seq experiment to identifygenomic regions in common.

 

Additional assistance

 

Rate Macro
allowUsersfalse