Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The library files associated with the selected assembly are organized into several sections. Below is some information on each section.

Table of Contents
excludeAdditional Assistance

Reference Files

This section includes two types of library file: reference sequence and cytoband files.

Reference sequences are the chromosome/scaffold/contig DNA sequences for a species. A reference sequence file is typically in FASTA or 2bit format. The reference sequence of a species is used for aligner index creation, variant detection against the reference sequence and visualization of the reference sequence in the Chromosome view.

Cytoband files are used for drawing ideograms of chromosomes in the the Chromosome view, including positions of cytogenetic bands if known.

...

Gene set files are required for biological interpretation analyses (e.g. GO enrichment). Genes are grouped together according to their biological function. Gene set files have to be in GMT format, where each row represents one gene set. The first column of a GMT file is the GO ID or gene set name. The second column is an optional text description. Subsequent columns are the gene symbols that belong to each gene set. Gene ontologies for various model organisms are available for automatic download from the Partek repository (source: geneontology.org). Because gene ontologies are frequently updated, geneontology.org is checked for updates quarterly. You can check for recent updates to the Partek repository on the Partek website here.

Variant annotations

Variant annotation databases are collections of known genomic variants (e.g. single nucleotide polymorphisms). If you have performed a variant detection study, detected variants can be searched against variant annotation library files to see if the detected variants are known from previous studies. Furthermore, you can validate detected variants against 'gold-standard' variant annotation library files. Variant annotation files are typically in VCF format.

Variant annotation databases from commonly used sources (e.g. dbSNP) are available for automatic download from the Partek repository. Because variant annotation databases are frequently updated, these sources are checked for updates quarterly. You can check for recent updates to the Partek repository on the Partek website here.

SnpEff variant databases

SnpEff1 is a variant annotation and effect prediction tool that requires its own variant annotation files, separate to the other Variant annotation library files. If you wish to use SnpEff, library files need to be added to this section.

VEP database

The Ensembl Variant Effect Predictor (VEP) is another variant annotation and prediction tool that requires its own annotation files, separate to the Variant annotation library files. If you wish to use VEP, library files need to be added to this section.

Annotation models

This section includes two types of library file: annotation models & aligner indexes.

...

Annotation models will appear in separate tables (Figure 2). If you have multiple versions of annotation models from the same source, it is advisable to distinguish them by their date or version number. 

Annotation models from commonly used sources (e.g. Refseq, ENSEMBL) are available for automatic download from the Partek repository. Because annotation models are frequently updated, these sources are checked for updates quarterly. You can check for recent updates to the Partek repository on the Partek website here.

Annotation models are used for quantification in gene expression analyses, annotating detected variants (e.g. to predict amino acid changes), visualizations in Chromosome view, generating coverage reports and for aligner index creation (see Adding Aligner Indexes Based on an Annotation Model). Typical file formats include GTF, GFF, GFF3 and BED.

 

 Image Modified

Figure 2. Annotation models are displayed in separate tables

...

The aligner indexes in the Annotation models section are required if you wish to align to a subset of the genome as defined by the annotation model, e.g. target amplicons or the transcriptome. The reference sequence is still required to generate an aligner index for an annotation model. As with whole genome alignment, indexes are aligner specific, although some aligners share indexes and are version specific (see Reference aligner indexes). The aligner indexes generated will be added to the corresponding annotation model table (Figure 2).

...