Join us for a webinar: The complexities of spatial multiomics unraveled
May 2

Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The library file management tool in Partek® Flow® provides an easy way to create, process and manage reference sequences, cytoband files, annotation models, aligner indexes, gene sets, variant databases and microarray probe sequence files.


This user guide will cover the following topics:

...


To access the library file management page click the avatar in the top right corner and choose Settings. Then click Library file management on the left.


The library file management page has two tabs - Genomic library files and Microarray library files. This section of the user guide will focus on the Genomic library files tab, which is relevant for next-generation sequencing analysis (Figure 5).

...


The library files associated with the selected assembly are organized into six major sections (Figure 5, above).


Below is some information on each section. For more detail on adding library files, see the Adding library files to an assembly on the library file management page section of this user guide.


Reference Files. This section includes two types of library file: reference sequence and cytoband files.


Reference sequences are the chromosome/scaffold/contig DNA sequences for a species. A reference sequence file is typically in FASTA or 2bit format. The reference sequence of a species is used for aligner index creation, variant detection against the reference sequence and visualization of the reference sequence in the Chromosome view.


Cytoband files are used for drawing ideograms of chromosomes in the Chromosome view, including positions of cytogenetic bands if known.


Reference aligner indexes. Next-generation sequencing aligners require the reference sequence to be indexed prior to alignment, as this greatly increases alignment speed. An index consists of a set of files (Figure 7) and are generally aligner specific. For example, if you wish to align using BWA, you need a BWA index.

 

Numbered figure captions
SubtitleTextBWA reference aligner index files for human hg18 assembly
AnchorNamebwa-library-file-details


Some of the supported aligners share indexes. If you want to align using Tophat, the Bowtie aligner indexes can be used. If you want to align using Tophat2, the Bowtie2 aligner indexes can be used.


Some aligner indexes are version specific, so care must be taken if you change aligner versions. For example, the index files for STAR version 2.4.1d are different to older versions of STAR.


This section contains aligner indexes for aligning to the whole genome. If you wish to align to a subset of the genome, e.g. targeted amplicons or the transcriptome, you must generate these indexes in the Annotation models section.


Gene sets. Gene set files are required for biological interpretation analyses (e.g. GO enrichment). Genes are grouped together according to their biological function. Gene set files have to be in GMT format, where each row represents one gene set. The first column of a GMT file is the GO ID or gene set name. The second column is an optional text description. Subsequent columns are the gene symbols that belong to each gene set.
Gene ontologies for various model organisms are available for automatic download from the Partek repository (source: geneontology.org). Because gene ontologies are frequently updated, geneontology.org is checked for updates quarterly. You can check for recent updates to the Partek repository on the Partek website ({+}http://www.partek.com/library-files-updates+).


Variant annotations. Variant annotation databases are collections of known genomic variants (e.g. single nucleotide polymorphisms). If you have performed a variant detection study, detected variants can be searched against variant annotation library files to see if the detected variants are known from previous studies. Furthermore, you can validate detected variants against 'gold-standard' variant annotation library files. Variant annotation files are typically in VCF format.


Variant annotation databases from commonly used sources (e.g. dbSNP) are available for automatic download from the Partek repository. Because variant annotation databases are frequently updated, these sources are checked for updates quarterly. You can check for recent updates to the Partek repository on the Partek website ({+}http://www.partek.com/library-files-updates+).


SnpEff variant databases. SnpEff (1) is a variant annotation and effect prediction tool that requires its own variant annotation files, separate to the other Variant annotation library files. If you wish to use SnpEff, library files need to be added to this section.


Annotation models. This section includes two types of library file: annotation models & aligner indexes.


Annotation models describe genomic features (e.g. genes, transcripts, microRNAs) for a specific version of the reference sequence. Annotation models contain labels (e.g. gene ID) and genomic coordinates (e.g. chromosome, start & stop position) for each feature.


Annotation models will appear in separate tables (Figure 8). If you have multiple versions of annotation models from the same source, it is advisable to distinguish them by their date or version number.
Annotation models from commonly used sources (e.g. Refseq, ENSEMBL) are available for automatic download from the Partek repository. Because annotation models are frequently updated, these sources are checked for updates quarterly. You can check for recent updates to the Partek repository on the Partek website ({+}http://www.partek.com/library-files-updates+).


Annotation models are used for quantification in gene expression analyses, annotating detected variants (e.g. to predict amino acid changes), visualizations in Chromosome view, generating coverage reports and for aligner index creation (see below). Typical file formats include GTF, GFF, GFF3 and BED.

 

Numbered figure captions
SubtitleTextAnnotation models are displayed in separate tables.
AnchorNameannotation-models

 

The gray arrows (/) next to the annotation model name expand/collapse each table. The three annotation models displayed are different versions from the same source (RefSeq), distinguishable by their date. Aligner indexes (e.g. for alignment to the transcriptome) are added to the table of the corresponding annotation model.


The aligner indexes in the Annotation models section are required if you wish to align to a subset of the genome as defined by the annotation model, e.g. target amplicons or the transcriptome. The reference sequence is still required to generate an aligner index for an annotation model. As with whole genome alignment, indexes are aligner specific, although some aligners share indexes and are version specific (see Reference aligner indexes above). The aligner indexes generated will be added to the corresponding annotation model table (Figure 8, above).

...

 

Numbered figure captions
SubtitleTextAdd reference sequence dialog. If the reference sequence and cytoband files have not been added yet, both options will appear in the Library type drop-down list (left). If one is missing, it will appear as the only option (right). For many model organisms, automatic downloads are available from the Partek repository.
AnchorNameadd-cytoband


If you are using an assembly supported by Partek (e.g. human), there are two radio button options: Download cytoband and Create cytoband from 2bit (Figure 25). Select Download cytoband and click Create to get the cytoband file from the Partek repository. Alternatively, select Create cytoband from 2bit and click Create to build the cytoband file. If the reference sequence is missing, it will either be downloaded automatically or you will be asked to import it from another source (see Adding a reference sequence).


If you are using a custom assembly (e.g. for a non-model organism), only the Create cytoband from 2bit option is available (Figure 26).

...


Note that this task is for adding indexes for alignment to the whole genome. If you want to align to the transcriptome or another set of genomic features, see Adding Aligner Indexes Based on an Annotation Model below.


Click the green plus ( ) icon next to the Reference aligner indexes section header. Alternatively, click the Add library file button, choose Aligner index from the Library type drop-down list (Figure 17, above) and whole genome from the Index to drop-down list. If an aligner index is already associated with an assembly, it will not appear in the Aligner drop-down list. If all but one of the possible aligner indexes have been added, the remaining aligner index will be the only option and will not appear in a drop-down list (Figure 27).

...


If you are using an assembly supported by Partek (e.g. human), there are three radio button options: Download index; Build index or Import index (Figure 27). Certain aligner indexes may not be available for automatic download because the file sizes are too large to download efficiently.


If available, select Download index and click Create to get the chosen reference aligner index from the Partek repository.


Alternatively, select Build index and click Create to build the reference aligner index. To build an aligner index, a reference sequence file must already be associated with the assembly. Depending on the aligner, you may have to specify further parameters. Consult the user documentation for each aligner for guidance (usually available on-line).


Alternatively, select Import index and click Create to add an aligner index from another source. An aligner index can be added from the Partek Flow Server, My Computer or a URL download link. The behavior of each option is similar to when importing a reference sequence (see Adding a reference sequence, above). When browsing for files on the Partek Flow server, only the files with relevant file extensions will be visible. This will vary for each aligner.


For custom assemblies (e.g. for non-model organisms), only the Build index and Import index options are available (Figure 28).

...

 

Numbered figure captions
SubtitleTextFor human - hg19, automatic downloads of various variant annotation databases are available from the Partek repository.
AnchorNameadd-variant-annotation-db


Choose a database from the drop-down list, select the Download variant database radio button and click Create.


If you prefer to add a custom variant annotation database, perhaps from another source or 'gold-standard' validated variants, choose Add variant database from the Variant annotation drop-down list (Figure 32). Name the variant annotation database by typing into the Custom Name box and click Create. A variant annotation database can be added from the Partek Flow Server, My Computer or a URL download link. The behavior of each option is similar to when importing a reference sequence (see Adding a reference sequence, above). When browsing for files on the Partek Flow server, only the files with relevant file extensions will be visible (.vcf and various compressed formats).

...


Click the green plus ( ) icon next to the SNPEff variant databases section header. Alternatively, click the Add library file button and choose SNPEff variant database from the Library type drop-down list (Figure 17, above).


If you are using human (hg19 and hg38), mouse (mm10) or rat (rn5 and rn6) assemblies, various versions of SNPEff variant databases are available for automatic download (Figure 33).

...


Click the green plus ( ) icon next to the Annotation models section header and choose Gene/feature annotation from the Library type drop-down list in the dialog (Figure 34). Alternatively, click the Add library file button and choose Gene/feature annotation from the Library type drop-down list (Figure 17, above).


If you are using an assembly supported by Partek (e.g. human), annotation models from a variety of commonly used sources (e.g. RefSeq, ENSEMBL, GENCODE) will appear in the Annotation model drop-down list in the dialog. Choose an annotation model, select the Download annotation file radio button and click Create (Figure 34).

...


Note that this task is for adding indexes for alignment to a subset of the genome (e.g. the transcriptome). If you want to align to the whole genome, see Adding Reference Aligner Indexes above.


Click the green plus ( ) icon next to the Annotation models section header and choose Aligner index from the Library type drop-down list in the dialog (Figure 37). Alternatively, click the Add library file button, choose Aligner index from the Library type drop-down list (Figure 17, above).


Choose the aligner you wish to use from the Aligners drop-down list (Figure 37). All 10 aligners are available for indexing to an annotation model.


The annotation model(s) that have already been associated with an assembly will appear at the top of the Index to drop-down list. Choose the annotation model you wish to index to, select the Build index radio button and click Create (Figure 37). To build an aligner index based on an annotation model, a reference sequence file must already be associated with the assembly.


If you are using an assembly supported by Partek (e.g. human), annotation models from a variety of commonly used sources will appear in the Index to drop-down list in addition to the ones that have already been associated with the assembly. If you choose an annotation model that has not already been associated, it will automatically be downloaded prior to building the index.

...

 

Numbered figure captions
SubtitleTextMicroarray libraries files tab
AnchorNamemicroarray-lib-file


Microarray probe tab files are used for processing microarray data in Partek Flow. When microarray intensity data files (e.g. Affymetrix .CEL files) are imported into a project, the chip type is automatically detected and the appropriate probe tab annotation file is downloaded. Thus, you would normally not need to manually add any probe tab annotation files.


To manually download a probe tab file, click the green Add probe sequence button at the top of the page (Figure 43, above). Choose the chip name from the drop-down list in the dialog, select the Download probe sequence radio button and click Create (Figure 44). If a chip has already been added, it will not appear in the Chip name drop-down list. We currently support automatic downloads of a broad variety of Affymetrix and Illumina microarray chips.

...