Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The library files associated with the selected assembly are organized into six major sections. Below is some information on each section. For more detail on adding library files, see the Adding library files to an assembly on the library file management page section of this user guide.

Reference Files

...

This section includes two types of library file: reference sequence and cytoband files.

...

Cytoband files are used for drawing ideograms of chromosomes in the Chromosome view, including positions of cytogenetic bands if known. 

Reference aligner indexes

...

Next-generation sequencing aligners require the reference sequence to be indexed prior to alignment, as this greatly increases alignment speed. An index consists of a set of files (Figure 1) and are generally aligner specific. For example, if you wish to align using BWA, you need a BWA index.

...

Some of the supported aligners share indexes. If you want to align using Tophat, the Bowtie aligner indexes can be used. If you want to align using Tophat2, the Bowtie2 aligner indexes can be used.

 

Some aligner indexes are version specific, so care must be taken if you change aligner versions. For example, the index files for STAR version 2.4.1d are different to older versions of STAR.

This section contains aligner indexes for aligning to the whole genome. If you wish to align to a subset of the genome, e.g. targeted amplicons or the transcriptome, you must generate these indexes in the Annotation models section.

Gene sets

...

Gene set files are required for biological interpretation analyses (e.g. GO enrichment). Genes are grouped together according to their biological function. Gene set files have to be in GMT format, where each row represents one gene set. The first column of a GMT file is the GO ID or gene set name. The second column is an optional text description. Subsequent columns are the gene symbols that belong to each gene set. Gene ontologies for various model organisms are available for automatic download from the Partek repository (source: geneontology.org). Because gene ontologies are frequently updated, geneontology.org is checked for updates quarterly. You can check for recent updates to the Partek repository on the Partek website.

Variant annotations

...

Variant annotation databases are collections of known genomic variants (e.g. single nucleotide polymorphisms). If you have performed a variant detection study, detected variants can be searched against variant annotation library files to see if the detected variants are known from previous studies. Furthermore, you can validate detected variants against 'gold-standard' variant annotation library files. Variant annotation files are typically in VCF format.

Variant annotation databases from commonly used sources (e.g. dbSNP) are available for automatic download from the Partek repository. Because variant annotation databases are frequently updated, these sources are checked for updates quarterly. You can check for recent updates to the Partek repository on the Partek website.

SnpEff variant databases

...

SnpEff (1) is a variant annotation and effect prediction tool that requires its own variant annotation files, separate to the other Variant annotation library files. If you wish to use SnpEff, library files need to be added to this section.

Annotation models

...

This section includes two types of library file: annotation models & aligner indexes.

...

Annotation models are used for quantification in gene expression analyses, annotating detected variants (e.g. to predict amino acid changes), visualizations in Chromosome view, generating coverage reports and for aligner index creation (see Adding Aligner Indexes Based on an Annotation Model). Typical file formats include GTF, GFF, GFF3 and BED.

...