RNA-seq mRNA Quantification

We are now ready to detect differentially expressed genes in our dataset. To do this, we will use the mRNA quantification task in the Analyze Known Genes section of the RNA-seq workflow. mRNA quantification creates spreadsheets showing expression at exon, transcript, and gene levels; identifies transcripts that are differentially expressed or spliced across all samples; and reports raw and normalize reads for each sample.

Please note that the normalization method used by Partek Genomics Suite is Reads Per Kilobase per Million mapped reads (RPKM) (Mortazavi et al. 2008). In brief, this counts total reads in a sample, divides by one million to create a per million scaling factor for each sample; then divides the read counts for the feature (exon, transcript, or gene) by the per million scaling factor to normalize for sequencing depth and give a reads per million value; and finally divides reads per million values by the length of the feature (exon, transcript, or gene) in kilobases to normalize for feature size.

Select mRNA quantification in the Analyze Known Genes section of the RNA-seq workflow

The RNA-Seq Quantification dialog will appear (Figure 1).

Select RefSeq Transcripts 2014-01-03 from the mRNA section of the Specify a database of genomic features to quantify panel of the dialog

Your choices in the Configure the test panel of the dialog depend on the design and aims of your experiment. A detailed description of each option can be viewed by selecting the () icon next to it.

For Strand-specificity: select No

Your choice here depends on the method used for sample preparation. A directional mRNA-seq sample preparation protocol only synthesizes the first strand of cDNA whereas other methods reverse transcribe the mRNA into double-stranded cDNA. If double-stranded cDNA has been synthesized, the sequencer reads sequences from both the forward and reverse strands but does not discriminate between them, eliminating strand information. When strand information is preserved, it is possible for paired-end sequences to come from a combination of the forward and reverse strands. If in doubt, select Auto-detect form the drop-down list. The data for this tutorial did not preserve strand information so we selected No.

For In the gene-level result report intronic reads as compatible with the gene?, select No

Selecting Yes would include intronic reads in the gene-level results, which might be useful for discovering unannotated transcripts for known genes, and also includes introns in the RPKM calculation for the gene-level results.

For Require strict paired-end compatibility select No

Selecting Yes would require that two alignments form the same read must map to the same transcript to be considered compatible. However, the data set used in this tutorial consists of single-end reads so this option is unnecessary.

For report results with no reads from any sample? select No

Selecting Yes would include all the genes/transcripts/exons in the transcriptome, even if there are no reads for that feature from any sample.

Make sure Report unexplained regions with more than ___ reads is selected and specify 5 as the number of reads

This option will create a spreadsheet that includes all regions with a specified number of reads that map to the genome, but not to any feature included in the selected database of genomic features.

Select Report exon-level results

If selected, spreadsheets will be created describing expression at the exon level.

Your RNA-Seq Quantification dialog should now be configured as shown (Figure 1). Descriptions of the spreadsheets that can be created by mRNA Quantification can be viewed by selecting Describe results to bring up the Quantification Result Help dialog.

Figure 1. Configuring the RNA-Seq Quantification dialog

Select OK to perform the RNA-seq quantification

Reads will now be assigned to individual transcripts of a gene based on the Expectation/Maximization (E/M) algorithm (Xing, et al. 2006). In Partek Genomics Suite software, the E/M algorithm is modified to accept paired-end reads, junction aligned reads, and multiple aligned reads if these are present in your data. For a detailed description of the E/M algorithm, refer to the RNA-Seq white paper (Help > On-line Tutorials > White Papers). Several spreadsheets containing the analyzed results will be generated. Progress bars in the lower left-hand corner RNA-seq Quantification window and the main window will update as the data is analyzed.

If you have not disabled it, the the Quantification Result Help dialog will appear. Select Close

The Analysis tab now shows the spreadsheets created by mRNA Quantification in the spreadsheet tree as child spreadsheets of 1 (RNA-seq) (Figure 2).

Figure 2. Viewing the results of mRNA Quantification

Data on features - genes, transcripts, and exons - is presented before and after normalization as _reads and _rpkm spreadsheets. In these spreadsheets, samples are listed one per row and the normalized counts of the reads mapped to features are in columns. _rpkm spreadsheets can be used to perform differential expression analysis using ANOVA. It may also be useful to view how samples group together using a PCA plot. Select View > Scatter Plot from the toolbar or press () from the quick action bar to create a PCA plot from the selected spreadsheet.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating:

Results:

0

rates

PGS Documentation

Page tree

Additional Assistance