For Partek® Genomics Suite® to recognize an annotation spreadsheet, it must meet several requirements. First, there must be a column header row in the annotation file. Second, there must be a column in the annotation file that matches the identifiers in your data spreadsheet. Third, any text field above the column header row must start with #. Fourth, the text fields must be tab or comma delimited.
We will illustrate associating a spreadsheet with an annotation file using an imported .txt data file from an Illumina HumanHT-12 v4.0 Gene Expression BeadChip array and the HumanHT-12 v4.0 Whole-Genome Manifest File (TXT Format) from Illumina.
Depending on how you imported the data, you may see a Configure Spreadsheet dialog (Figure 3). Select the most appropriate option for your data; here we have chosen Genomic microarray.
|
The Configure Genomic Properties dialog will now open. Select the appropriate option for Choose the type of genomic data; here we have chosen Gene Expression (Figure 4).
Location of genomic features in spreadsheet gives option to specify whether genomic features (e.g. genes, miRNAs, probes, SNPs, CpGs etc) are represented by columns or rows:
Choose chips/reference and annotation files section allow you to specify feature annotation file associates to the current spreadsheet
If the genomic position information from the annotation file cannot be automatically parsed, the Configure Annotation dialog will launch. This dialog allows you to choose which columns in the annotation file give the identity and genomic location of the features in your data spreadsheet. There are four options depending on if and how chromosome coordinates are described in the annotation file.
The Choose the columns section displays the annotation file spreadsheet with options to choose which columns are the Marker ID,Chromosome, and Physical Position (Figure 3).
An index file is generated in the same folder as the annotation file, it has the same file name as the annotation file, but the extension name is .idx. If you need to re-configure the genomic location field in the annotation file, you need to manually delete the .idx file and re-do the above step to re-generate .idx for the annotation file.
Annotation column with gene symbols or miRNA names: in the annotation file, if the gene symbol field is labeled as Gene Symbol, PGS is using this field to look up gene set/pathway database, however, if the gene symbol/or miRNA name field is labels as something else, e.g. SystematicName, you need to manually specify this field as gene symbol field.
Note: Species and gene symbol information is required for biological interpretation analysis.
To verify that the annotation has been added, we can try to add annotation information to the spreadsheet when the feature are on rows in the spreadsheet.
The Column Configuration section of the Add Rows/Columns to Spreadsheet dialog should contain all the feature annotations from the annotation file spreadsheet (Figure 6). Here we selected ILMN_Gene, which will add gene name information as a column next to 1. ID_REF.
Annotation files for most commercial arrays are available from the chip manufacturer. If you have a custom chip or want to use a customized annotation file, you can create an annotation file that will allow you to add annotations to your features (e.g. probe IDs) when the features are represented by rows on the spreadsheet. Your annotation file must meet the following criteria:
To invoke a genome view of your data, your annotation file must also have one or more columns that contain the genomic location in a format that Partek can recognize.
The annotation file must also contain a column that has the chromosome and base pair location (start and stop or physical position). Cytoband and/or strand can also be included. The table below provides possible column labels, a description of the format for that field, and an example.
Column label | Description of format | Example |
---|---|---|
chromosome | a chromosome label | 3 |
start | an integer, the start position (in base pairs) of the feature | 69871322 |
stop | an integer, the stop position (in base pairs) of the feature | 70100176 |
genomic_coordinates | chromosome:start-stop | 3:69871322-70100176 |
strand | + for top, - for bottom | + |
physical position | an integer, the position (in base pairs) of the feature | 70100176 |
Here are a few examples of the first two rows of annotation files:
ProbeID | GeneName | GenomicCoordinates | Cytoband |
---|---|---|---|
A_44_P1025812 | TC521361 | chr12:2546883-2546824 | rn|12p12 |
Probe Set ID | Chromosome | Physical Position | Strand | Cytoband |
---|---|---|---|---|
SNP_A-1512540 | 9 | 22205296 | - | p21.3 |
probeset_id | seqname | strand | start | stop |
---|---|---|---|---|
2315588 | chr1 | + | 1155398 | 1155624 |
|