Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Trim bases from both ends (Figure 4) allows user to keep only bases from a fixed start and end position of the reads. This is particularly useful if poor quality bases are observed on both ends  of the read. So instead of performing trim bases successively from the 5'- and 3'-end, the trim bases will only be performed once by trimming from both ends.

 

Numbered figure captions
SubtitleTextTrim bases from both ends
AnchorNametrim-both-ends

 

...

Trim bases based on quality score (Figure 5) is probably the most useful function to trim poor quality bases from the 5'- or 3'-ends of reads. This function allows dynamic trimming of bases depending on quality score. The trimming can be done from either 5'-end, 3'-end or both ends of the reads. The function evaluates each base from the end of the read and trims it away until the last base has a quality score greater than the specified threshold. For an extensive evaluation of read trimming effects on Illumina NGS data analysis, see Del Fabbro et. al. [1].

 

Numbered figure captions
SubtitleTextTrim bases based on quality score
AnchorNametrim-qual-score

Image Modified

Advanced options

In some cases, the reads that result from base trimming can have very short read lengths and thus are not recommended for alignment.Thus, Partek Flow provides the option to set a Min read length after base trimming. This discards reads that are shorter than the set length. 

...

Figure 6 shows the options available for all the different selection of Trim bases function. Note the default Min read length is 25bp. For micro RNA sequencing data, this default Min read length needs to be set to a smaller value (we recommend 15) to account for mature microRNAs.

 

 

Numbered figure captions
SubtitleTextTrim bases options. A) Trim from 3'-end; B) Trim from 5'-end; C) Trim from both ends; D) Trim based on base quality score
AnchorNametrim-advanced

...

In order to know whether the data has been adapter-trimmed for microRNA data, we can look at the pre-alignment QA/QC of the raw data, specifically the read length distribution. If the read length distribution peaks at approximately 22-23 bases, this usually means the data has been adapter-trimmed. However, if you have a fixed length distribution, then very likely the data is not adapter-trimmed and you will need to get the adapter sequence from your vendor or service provider and use the Trim adapter function to trim away the adapter sequence.

 
Partek Flow software wraps Cutadapt [2], a widely used tool for adapter trimming. It can be used to trim adapter sequences in nucleotide-space data as well as color-space data.

...

The first section of advanced options is the Adapter options. This is used to configure how the matching between the adapter sequence and the read will be performed. This includes the maximum error rate allowed, the number of matched times, minimum length of overlapped bases, allowing Ns (ambiguous base) in adapter and whether N will be treated as wildcards. User can roll-over mouse cursor to the info button to get more information of each parameter.

 

The second section of advanced options is the Filtering options. This is used to filter adapter-trimmed reads which are shorter than the minimum read length. This is to avoid having reads too short because short reads gives non-unique alignment and we would like to avoid that.

...

Subsample FASTQ is only available for unaligned reads of FASTQ format. To trigger this function, select the Unaligned Reads data node and select Subsample FASTQ from the Pre-alignment tools section on the menu. Then specify how many reads you want to keep for every nth reads. For example: if the user specifies to "Keep one read for every 10 reads" (Figure 9), this means that for every 10 reads, the program will keep only 1 read. This is equivalent to keeping 10% of the data.

 

Numbered figure captions
SubtitleTextSubsample FASTQ page. This option shows getting a subset of raw data by keeping one read for every 10 reads.
AnchorNameFASTQ subsample

...

  1. Del Fabbro C, Scalabrin S, Moragante M, Giorgi FM. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS ONE. 2013; 8(12): e85024.
  2. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011; 17: 10-12.

 

Additional assistance

 

Rate Macro
allowUsersfalse