Page History

Raw read counts are generated after quantification for each feature on all samples. These read counts need to be normalized prior to differential expression detection to ensure that samples are comparable.

This chapter covers the implementation of each normalization method. The Normalize counts option is available on the If your experimental design includes a sample or a group of samples serving as a baseline control, you can normalize the experimental samples by subtracting or dividing by the baseline sample(s) using the Normalize to baseline task in Partek Flow. For example, in PCR experiments, the delta Ct values of control samples are subtracted from the delta Ct values of experimental samples to obtain delta-delta Ct values for the experimental samples.

The Normalize to baseline option is available in the Normalization and Scaling section of the context-sensitive menu (Figure 1) upon selection of any quantified output count matrix data node or an imported count matrix:

Gene counts
Transcript counts
MicroRNA counts
Cufflinks quantification
Quantification

.

Numbered figure captions

SubtitleText	When a data node containing quantified data a count matrix is selected, Normalize counts becomes available on the context sensitive menu
AnchorName	toolbox-normalize-counts

Image Removed

The format of the output is the same as the input data format, the node is called Normalized counts. This data node can be selected and normalized further using the same task.

Selecting Methods

Select whether you want your data normalized on a per sample or per feature basis (Figure 2). Some transformations are performed on each value independently of others e.g. log transformation, and you will get an identical result regardless of your choice.

Numbered figure captions

SubtitleText	Transformation can be done on samples or on features
AnchorName	transform-on

Image Removed

The following normalization methods will generate different results depending on whether the transformation was performed on samples or on features:

Divided by mean, median, Q1, Q3, std dev, sum

Subtract mean, median, Q1, Q3, std dev, sum

Quantile normalization

Note that each task can only perform normalization on samples or features. If you wish to perform both transformations, run two normalization tasks successively. To normalize the data, click on a method from the left panel, then drag and drop the method to the right panel. Add all normalization methods you wish to perform. Alternatively, you can click on the green plus button (Image Removed) on each method to add it. Multiple methods can be added to the right panel and they will be processed in the order they are listed. You can change the order of methods by dragging each method up or down. To remove a method from the Normalization order panel, click the minus button (Image Removed) to the right of the method. Click Finish, when you are done choosing the normalization methods you have chosen.

Recommended Methods

For some data nodes, recommended methods are available:

Data nodes resulting from Quantify to annotation model (Partek E/M) or Quantify to reference (Partek E/M) are raw read counts, the recommendation is Total Count, Add 0.0001
Cufflinks quantification data node output FPKM normalized read counts, the recommendation is Add 0.0001

If available, the Recommended button will appear. Clicking the button will populate the right panel (Figure 3).

to baseline is available in the toolbox
AnchorName	norm_baseline_option

Image Added

There are three options to choose the baseline samples:

use all samples
use a group
use matched pairs

Use all samples to create baseline

To normalize data to all the samples, choose to calculate the baseline using the mean or median of all samples for each feature, and choose to subtract baseline or ratio to baseline for the normalization method (Figure 2), and click Finish.

Numbered figure captions

SubtitleText	Normalization using Partek’s recommended method
AnchorName	recommended-methods-button

Image Removed

Normalization Methods

Below is the notation that will be used to explain each method:

Symbol	Meaning
S	Sample (or cell for single cell data node)
F	Feature
X_sf	Value of sample S from feature F (if normalization is performed on a quantification data node, this would be the raw read counts)
TX_sf	transformed value of X_sf
C	Constant value
b	Base of log

Absolute value
TX_sf = | X_sf |
Add
TX_sf = X_sf + C
a constant value C needs to be specified
Antilog
TX_sf = bx_sf
A log base value b needs to be specified from the drop-down list; any positive number can be specified when Custom value is chosen
CLR (centered log ratio)
TX_sf =ln((X_sf +1)/geom (X_sf +1) +1)
geom is geometric mean of either observation or feature. We recommend to perform this normalization on observation for CITE seq data.

...

1. Normalize the reads by the length of feature, it generate reads per kilobase
  RPK_sf = X_sf / L_f;
2. Sum up all the RPKsf in a sample
  PRK_s = ∑^F_f=1 FRPK_sf
3. Generate a scaling factor for each sample by normalizing the PRK of the sample to the sum PRK of all the samples
  Image Removed,
  where TR is the total reads across all samples
4. Divide raw reads by the scaling factor to get TPM
  TX_sf = X_sf/K_s

Upper quartile
The method is exactly the same as the LIMMA package [7].
The following is the simple summarization of the calculation:

1. Remove all the features that have 0 reads in all samples.
2. Calculate the effective library size per sample: effective library size = (raw library size (in millions))*((upper quartile for a particular sample)/ (geometric mean of upper quartiles in all the samples))
3. Get the normalized counts by dividing the raw counts per feature by the effective library size (for the respective sample)

Normalization Report

The Normalization report includes the Normalization methods used, a Feature distribution table, Box-whisker plots of the Expression signal before and after normalization, and Sample histogram charts before and after normalization. Note that all visualizations are disabled for results with more than 30 samples.

Normalization methods

A summary of the normalization methods performed. They are listed by the order they were performed.

Feature distribution table

A table that presents descriptive statistics on each sample, the last row is the grand statistics across all samples (Figure 4).

Use the mean or median of all samples as the baseline to normalize the data
AnchorName	norm-to-all

Image Added

Use a group to create baseline

When there is a subset of samples that serve as the baseline in the experiment, select use group for Choose baseline samples. The specific group should be specified using sample attributes (Figure 3).

Numbered figure captions

SubtitleText	Use a subgroup of samples to create baseline to normalize the data
AnchorName	norm-to-group

Image Added

Choose use group, select the attribute containing the baseline group information, e.g. Treatment in this example, with the samples with the group Control for the Treatment attribute used as the baseline. The control samples can be filtered out after normalization by selecting the Remove baseline samples after normalization check box.

Use matched pairs

When using matched pairs, one sample from each pair serves as the control. An attribute specifying the pairs must be selected in addition to an attribute designating which sample in each pair is the baseline sample (Figure 4).

...

Additional assistance

Numbered figure captions

SubtitleText	Feature distribution statistic information on each sample and across all the samples
AnchorName	feature-distribution-statistic

Image Removed

Expression signal

These box-whisker plots show the expression signal distribution for each sample before and after normalization. When you mouse over on each bar in the plot, a balloon would show detailed percentile information (Figure 5).

Numbered figure captions

SubtitleText	Box-whisker plot displays expression signal distribution for each sample
AnchorName	box-whisker-plot

Image Removed

Sample histogram

A histogram is displayed for data before and after it is normalized. Each line is a sample, where the X axis is the range of the data in the node and the Y-axis is the frequency of the value within the range. When you mouse over a circle which represent a center of an interval, detailed information will appear in a balloon (Figure 6). It includes:

The sample name.
The range of the interval, “[ “represent inclusive, “)” represent exclusive.
The frequency value within the interval

Numbered figure captions

SubtitleText	Sample histogram. Mousing over shows detailed information about the interval. This includes sample name, range and frequency of the selected sample.
AnchorName	sample-histogram

Image Removed

References

Bolstad BM, Irizarry RA, Astrand M, Speed, TP. A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance. Bioinformatics. 2003; 19(2): 185-193.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008; 5(7): 621–628.
Bullard JH, Purdom E, Hansen KD, Dudoit S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010; 11: 94.
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010; 11: R25.
Dillies MA, Rau A, Aubert J et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2013; 14(6): 671-83.
Wagner GP, Kin K, Lynch VJ. Measurement of mRNA abundance using RNA-seq data. Theory Biosci. 2012; 131(4): 281-5.
Ritchie ME, Phipson B, Wu D et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43(15):e97.

Designated pairs and the baseline sample in each pair to normalize by matched pairs
AnchorName	norm_to_pair

Image Added

After normalization, all values for the control sample will be either 0 or 1 depending on the normalization method chosen, so we recommend removing baseline samples when using matched pairs.

The output of Normalize to baseline is a Normalized counts data node.

Additional assistance

Rate Macro

allowUsers	false

Partek Flow Documentation

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Selecting Methods

Recommended Methods

Use all samples to create baseline

Normalization Methods

Normalization Report

Normalization methods

Feature distribution table

Use matched pairs

Expression signal

Sample histogram

References