PGS Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Starting with copy number estimates for each marker (either taken directly from the vendor’s input file or calculated previously), the goal next step is to derive create a list of regions where adjacent markers share the same copy number.

...

There are two algorithms available for copy number region detection: Genomic Segmentation and Hidden Markov Model (HMM). Both algorithms look for trends across multiple adjacent markers. The genomic segmentation algorithm identifies breakpoints in the data, i.e., - changes in copy number between two neighboring regions. The HMM algorithm looks for discrete changes of whole number copy number states (e.g., 0, 1, 2 … with no upper limit) and will find regions with those numbers of copies. Therefore, the HMM model performs better in cases of homogeneous samples where copy numbers can be anticipated such as clinical syndromes with underlying copy number aberrations. Genomic segmentation is preferable for heterogeneous samples with unpredictable copy numbers such as cancer because tumor biopsies often contain “contaminating” healthy tissue and cancer cells a tumor can have heterogeneous copy number cells with different genomic aberrations.

Detecting amplifications and deletions with Genomic Segmentation 

...

The Genomic Segmentation task is divided into two steps. In the first step, each region is compared to an adjacent region to determine whether both have the same average copy number and whether a breakpoint can be inserted. This task determines this is determined by first using a two-sided t-test to compare the average intensities of adjacent regions and then checks checking whether the corresponding cut-off p-value is below the specified P-value threshold. The genomic size of a region is defined by the numbe rof number of gneomic markers in the region (Minimum genomic markers), while the magnitude of the significant difference between two regions is controlled by Signal to noise, which can be thought of , if simplified, to be as the difference in copy numbers between the regions. If the t-test is significant, ithe copy number of the region differs significantly from its nearest neighbors. However, a second step is needed to detemine whether the difference is due to amplificaiton or deletion. In this second stagestep, two one-sided t-tests are used to copare compare the mean copy number in the region with the expected (normal) diploid copy number. For a detailed explanation of the genomic segmenetation procedure, please consult our Genomic Segmentation white paper. For more detailed information about fine-tuning the parameters of your copy number analysis, please consult our guide, Optimizing Copy Number Segmentation

...

Numbered figure captions
SubtitleTextViewing the segmentation spreadsheet
AnchorNameSegmentation spreadsheet

If desired, you can use the use Merge Adjacent Regions under Tools in the main toolbar to combine similar regions. 

...

  • Right-click a row header in the segmentation spreadsheet; here we have chosen row 5. 
  • Select Browse to location from the pop-up menu 

...

The Genomic Segementation track displays the segmentation results (Figure 5). Each line in the track represents a smaplesample. Amplified, deleted, and unchanged regions are shown in red, blue, and white, respectively. The Profile track now also includes information from the segmentation spreadsheet ffor the selected sample. 

...

Analyzing shared regions of copy number variation

Once regions with amplification and deletion Amplified and deleted regions in each sample have been detected, we can compare the regions across multiple samples to detect copy number changes that are shared by multiple samples. 

 

  • Select Analyze detected segments from the Copy Number Analysis section of theworkflowthe workflow

The Analyze Segments task (Figure 6) can test for associations between copy number variations and sample categories using the χ2 test. In this tutorial, all pairs share the sample phenotype, so we will not test for associations. 

...

The task generates a new spreadsheet, summary (segment-analysis) (Figure 7), shows with one region per row. The columns provide the following information: 

...

A "?" indicates that a region with the particular characterisitic does not exist or cannot be computed. For example, if a region is not amplified in any of the samples, the average amplified copy number will be shows as "?". This list may be filtered to contain only region regions that meet user-specified criteria as discussed in the next section of the tutorial. 

...

To get an overiew of the common abberations in the group of samples over the entire genome , there are two helpful visualizations that are accessed through we can use View Detected Regions. 

  • Select View Detected Regions 

The View Detected Regions dialog (Figure 7) allows you to select the spreadsheet with genomic regions and choose between the histogram and copy number classification plots. 

...

The Karyogram View shows each chromosome with red and blue histograms on either side showing corresponding to amplification and deletion, repsectively. The histogram height reflects the number of samples that share either amplification of deletion a that particular region. For example, the long arms of chromosomes 3 and 7 are amplified in the majority of samples and most samples share a deletion in the long arm of chromosome 4. 

...