Principal component analysis (PCA) can be performed to visualize clusters in the methylation data, but also serves as a quality control procedure; outliers within a group could suggest poor data quality, batch effects, mislabeled samples, or uninformative groupings.

Each dot of the plot is a single sample and represents the average methylation status across all CpG loci. The two LCLs samples do not cluster together, but we will not exclude them for this tutorial. 

 

 

Next, distribution of M-values across the samples can also be inspected by a box-and-whiskers plot. 

Each box-and-whisker is a sample and the y-axis shows M-value ranges. Samples in this data set seem reasonably uniform (Figure 2).

 

An alternative way to take a look at the distribution of M-values is a histogram. 

Again, no sample in the tutorial data set stands out, although there appears to be higher variance among the LCLs group (Figure 3).