Page History

...

As reads can be aligned to more than one location, the number of alignments may be greater than the number of reads; since some reads may be unaligned, the number of alignments may be less than the number of reads in the bam/sam file (Figure 1).

Numbered figure captions

SubtitleText	Counting reads and alignments for paired-end reads. Sequencing read 1 is imported as one paired-end read, with two alignments. Sequencing read 2 is imported as one paired-end read, with three alignments
AnchorName	counting-reads

Image Modified

PGS shows the number of alignments per sample in the “parent” spreadsheet of a RNA-seq project, while the alignments and reads per sample are reported in the mapping_summary spreadsheet. The alignment_counts spreadsheet contains the number of alignments per read in each sample and it can be invoked through the QA/QC section of the RNA-seq workflow.

...

Exonic: A read is labeled exonic if any one of its alignments is completely contained within the respective exon as defined by the database (i.e., even if there is a single base shift of the read relative to the exon, the read will not be called exonic but will fall in the category ‘partially overlaps exon’). If the alignments are strandspecific, then the strand of the alignment must also agree with the strand of the transcript.
Partially overlaps exon: A read is assigned to partially overlap an exon if any of its alignments overlaps an exon, but at least partly (one base-pair or more) maps out of the exon.
Intronic: A read is labeled intronic if any one of its alignments maps completely within an intron, but none of the alignments are exonic (either fully or in part). If the alignments are strand-specific, then the strand of the alignment must also agree with the strand of the transcript.
Reads between genes: A read is labeled ‘between genes’ if none of its alignments overlap a gene.

Numbered figure captions

SubtitleText	Mapping reads to transcripts. A transcript (blue) contains exonic (boxes) and intronic regions (the line joining the boxes). Sequencing reads (light blue) are assigned according to the positions they map to. 1: exonic (fully overlaps an exon), 2: intronic (fully contained within an intron), 3 & 4: partially overlap exon, 5: between genes
AnchorName	mapping-reads

Image Modified

RPKM Scaling

Standard output of mapping performed by the quantification step includes raw read counts and scaled read counts for every gene and transcript for each sample. The scaling method currently applied is reads per kilo-base of exon model per million mapped reads (RPKM) (Mortazavi et al. Nat Methods 2008). It scales the abundance estimates using exon length and millions of mapped reads and is calculated according to the formula below.

...

Compatible reads with junction are reported in a separate block of columns in the transcripts spreadsheet (“junction RPKM”).

Numbered figure captions

SubtitleText	Number of reads per transcript (rows) and sample (columns). “Transcripts” spreadsheet provides raw number of compatible reads as well as scaled number of reads (RPKM) for each sample. In addition, Partek® Genomics Suite™ gives the scaled number of reads for compatible junction reads and for incompatible reads. For definitions, please refer to the text
AnchorName	num-of-reads

Paired-End Data Scenarios

...

Considering genes with multiple transcripts, a read can be both counted as compatible for some transcripts, as well as counted incompatible for other transcripts of the same gene (Figure 4). Please note that this concept holds for single-end reads as well.

Numbered figure captions

SubtitleText	Compatibility of reads corresponding to genes with multiple transcripts. First in pair maps to all three transcripts, while second maps to transcripts A and B. The paired-end read is compatible with transcripts A and B, but is not compatible with the transcript C. Although the picture shows paired-end reads, the same rules apply for single-end reads
AnchorName	compatibility-of-reads

Furthermore, all the reads that have at least one alignment contribute to the denominator of the RPKM (millions of _mapped reads_ per sample). Similar to that, the denominator of the RPKM contains all the mapped reads, regardless of whether or not a read is compatible with any transcript.

With respect to the gene-level summarization, user can decide whether or not to consider intronic reads (both single- and paired-end) as compatible with the gene (Figure 5). In the latter case, the entire gene is basically treated as one giant exon. In the case that one end of a paired-end read falls within a gene, and the other end does not, the read will not be included in the gene-level summarization.

Numbered figure captions

SubtitleText	Configuring the mRNA quantification. Gene-level result can report intronic reads as compatible or incompatible with genes. The option applies to both single-end and paired-end reads
AnchorName	configure

Unexplained regions

The unexplained regions portion of quantification considers any read that is considered “not compatible” with all transcripts. It is basically a 3 step process:

...

When interpreting the unexplained reads, you should have in mind that these are actually the reads not compatible with the applied transcript model. By changing the model, some reads will become compatible and thus will not be labeled “unexplained” any more. Figure 6 shows such an example. The depicted regions map just downstream of the human LONP2 gene as defined by RefSeq and are hence flagged as unexplained. However, by overlaying the AceView transcripts, it is apparent that mapping to AceView would yield a different result.

Numbered figure captions

SubtitleText	Regions track in the Partek Genome Viewer. Regions (colored boxes) detected in the four samples contain sequencing reads that are not compatible with the RefSeq database (upper transcripts track) but are compatible with at least one of the exons of the same gene defined by the AceView database (lower transcripts track)
AnchorName	regions-track

Additional assistance

Rate Macro

allowUsers	false

Partek Flow Documentation

Page tree

Versions Compared

Old Version 1

New Version 2

Key

RPKM Scaling

Paired-End Data Scenarios

Unexplained regions