PGS Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In ChIP-Seq, genomic DNA is fragmented by restriction digest and target-protein-bound DNA fragments are purified by immunoprecipitation. These purified fragments are between 100 and 500 base pairs depending on the protocol; however, because ChIP-Seq uses short-read sequencing (25 to 35 base pair reads) to maximize sequencing depth, only the ends of each fragment will be sequenced. Consequently, in with single-end sequencing reads on , the forward and reverse strands for the same each fragment will be from opposite ends of each the fragment. At a protein-binding site, there will thus be two peaks, one from enrichment of forward strand reads and another from enrichment of reverse strand reads. The average distance between these peaks is termed the effective fragment length. Because the forward and reverse strand peaks are generated from a common set of fragments, the peaks should be roughly symmetrical. By phase shifting the data to the mid-point between the two peaks using the effective fragment length as a standard, a common read density plot can be created that shows single peaks at binding sites. 

Strand Cross-Correlation allows us to use the symmetrical distribution of forward and reverse strand fragments find calculate the effective fragment length (Kharchenko et al., 2008). The Pearson correlation coefficient between the read densities of the forward and reverse strands is calculated after different effective fragment length phase shifts are applied is calculatedof between 0 and 500 base pairs. This is visualized with a range of possible effective fragment length phase shifts the phase shift range on the x-axis and the corresponding Pearson correlation coefficient coefficients between forward and reverse strand read densities when each phase shift is applied on the y-axis (Figure 1). High-quality ChIP-Seq data will give a strong peak on the Strand Cross-Correlation plot for at the effective fragment length. For paired-end sequencing, Strand Cross-Correlation is calculated from the distribution of distances between the paired reads from the ends of each fragment. When  When calling peaks, the forward and reverse (or paired end) reads are each phase-shifted by the effective fragment length to create a combined read density profile.

For paired-end sequencing, Strand Cross-Correlation is calculated from the distribution of distances between the paired reads from the ends of each fragment.  

Let's perform Strand Cross-Correlation to identify the effective fragment length we will use when calling read enrichment peaks. 

...