Page History

Selecting a node with unaligned reads (either Unaligned reads or Trimmed reads) shows the QA/QC section in the toolbox, with two options (Figure 1). To assess the quality of your raw reads, use Pre-alignment QA/QC.

Numbered figure captions

SubtitleText	QA/QC options on unaligned reads
AnchorName	qaqa-toolbpx

Pre-alignment QA/QC setup dialog is given in Figure 2. Examine reads allows you to control the number of reads processed by the tool; All reads, or a subset (One of every n reads). The latter option is obviously not as thorough, but is much faster than All reads.

If selected, K-mer length creates a per-sample report with the position of the most frequent k-mers (i.e. sequences of k nucleotides) of the length specified in the dialog. The range of input values is from one to 10.

The last control refers to .fastq files. Partek^® Flow^® can automatically detect the quality encoding scheme (Auto detect) or you can use one of the options available in the drop-down list. However, the auto-detection is only applicable for Phred+33 and Phred+64 type of quality encoding score. For early version of Solexa quality encoding score, select Solexa+64 from the Quality encoding drop down list. For a paired-end data, the pre-alignment QA/QC will be done on each read in pair separately and the results will be shown separately as well.

Numbered figure captions

SubtitleText	Pre-alignment QA/QC setup dialog (defaults)
AnchorName	pre-align-dialog

The task report is organised in two tiers. The initial view shows project-level report with all the samples. An overview table is at the top, while matching plots are below.

The Pre-alignment QA/QC output table contains one input file per row, with typical metrics on columns (%GC: fraction of GC content; %N: fraction of no-calls) (Figure 3). The file names are hyperlinks, leading to the sample-level reports. To save the table as a txt file to a local computer, push the Download link. Table columns can be sorted using double arrows icon (Image Modified).

Numbered figure captions

SubtitleText	Pre-alignment QA/QC output table (project-level). Each row is an input file. %N: proportion of no-calls, %GC: GC content
AnchorName	result-table

Two project-level plots are Average base quality per position and Average base quality score per read (Figure 4). The latter plot presents the proportion of reads (y-axis) with certain average quality score (meaning all the base qualities within a read are averaged; x-axis). Mouse over a data point to get the matching readouts. The Save icon saves the plot in a .svg format to the local machine. Each line on the plot represents a data file and you can select the sample names from the legend to hide/un-hide individual lines.

Numbered figure captions

SubtitleText	Pre-alignment QA/QC project-level plots (each line is a file)
AnchorName	project-level-plots

A sample-level report begins with a header, which is a collection of typical quality metrics (Figure 5).

Numbered figure captions

SubtitleText	Header of a sample-level pre-alignment QA/QC report
AnchorName	header-sample

Below the header you will find four plots: Base composition, Average base quality score per position (same as above, but on the sample level), Distribution of base quality scores (the same as Average base quality score per read, but on the sample level), and Distribution of read lengths.

Base composition plot specifies relative abundance of each base per position (Figure 6), with N standing for no-calls. By selecting individual bases on the legend, you can remove them from the plot / bring them back on. To zoom in, left-click & drag over a region of interest. To zoom out, use the Reset button (Image Modified) to recreate the original view, or the magnifier glass (Image Modified) to zoom out one level.

Numbered figure captions

SubtitleText	Base composition plot: fraction of each base at a given position within a read. N: no call
AnchorName	base-composition-plot

Distribution of read lengths (Figure 7) shows a single column for fixed length data (e.g. Illumina sequencing). However, for quality-trimmed data or non-fixed length data (like Torrent sequencing), expect to see a read’s length distribution.

Numbered figure captions

SubtitleText	Distribution of read lengths (an example processed by Ion Torrent sequencer is shown)
AnchorName	dist-read-length

If K-mer length option was turned on when setting up the task, an additional plot will be added to the sample-level report, i.e. K-mer Content (Figure 8). For each position, K-mer composition is given, but only the top six most frequent K-mers are reported; high frequency of a K-mer at a given site (enrichment) indicates a possible presence of sequencing adapters in the short reads.

Numbered figure captions

SubtitleText	K-mer Content plot. Position of a K-mer is on the horizontal axis, while the K-mer frequency is on the y-axis. Top six K-mers are listed below the plot. In this example, the value of K was set to eight. The most frequent reported K-mer (CTGTCTCT) is a reverse complement of a commonly used adapter (AGAGACAG)
AnchorName	kmer-content

The pre-alignment QA/QC report as described above is generally available for the NGS data of fastq format. For other types of data, the report may differ depending on the availability of information. For example, for fasta format, there is no base quality score information and therefore all the figures or graphs related to base or read quality score will be unavailable.

Additional assistance

Rate Macro

allowUsers	false

Partek Flow Documentation

Page tree

Versions Compared

Old Version 15

New Version 16

Key