Each row of the spreadsheet (Figure 1) corresponds to a single sample. The first column is the names of the .idat files and the remaining columns are the array probes. The table values are β-values, which correspond to the percentage methylation at each site. A β-value is calculated as the ratio of methylated probe intensity over the overall intensity at each site (the overall intensity is the sum of methylated and unmethylated probe intensities). 

 

An alternative metric for measurement of methlyation levels are M-values. β-values can be easily converted to M-values using the following equation:

M-value = log2( β / (1 - β))

An M-value close to 0 for a CpG site indicates a similar intensity between the methylated and unmethylated probes, which means the CpG site is about half-methylated. Positive M-values mean that more molecules are methylated than unmethylated, while negative M-values mean that more molecules are unmethylated than methylated.  As discussed by Du and colleagues, the β-value has a more intuitive biological interpretation, but the M-value is more statistically valid for the differential analysis of methylation levels.

Because we are performing differential methylation analysis, we need to convert our data to from β-values to M-values.

The original data (β-values) will be overwritten.

Before we can perform any analysis, the study samples need to be organized into their experimental groups.

The Create categorical attribute dialog allows us to create groups for a categorical attribute. By default, two groups are created, but additional groups can be added. 

Sample IDCell Type
GSM2452106_200483200025_R04C01B cells
GSM2452107_200483200021_R01C01B cells
GSM2452108_200483200021_R02C01B cells
GSM2452109_200483200025_R06C01B cells
GSM2452110_200483200025_R07C01B cells
GSM2452111_200483200021_R08C01B cells
GSM2452112_200483200021_R06C01B cells
GSM2452113_200483200021_R04C01B cells
GSM2452114_200483200025_R01C01LCLs
GSM2452115_200483200025_R03C01LCLs
GSM2452116_200483200021_R03C01LCLs
GSM2452117_200483200025_R05C01LCLs
GSM2452118_200483200025_R02C01LCLs
GSM2452119_200483200021_R07C01LCLs
GSM2452120_200483200021_R05C01LCLs
GSM2452121_200483200025_R08C01LCLs

There should now be two groups with eight samples in each group (Figure 3).

 

Sample IDGender
GSM2452106_200483200025_R04C01Female
GSM2452107_200483200021_R01C01Female
GSM2452108_200483200021_R02C01Male
GSM2452109_200483200025_R06C01Female
GSM2452110_200483200025_R07C01Female
GSM2452111_200483200021_R08C01Female
GSM2452112_200483200021_R06C01Female
GSM2452113_200483200021_R04C01Male
GSM2452114_200483200025_R01C01Female
GSM2452115_200483200025_R03C01Female
GSM2452116_200483200021_R03C01Male
GSM2452117_200483200025_R05C01Female
GSM2452118_200483200025_R02C01Female
GSM2452119_200483200021_R07C01Female
GSM2452120_200483200021_R05C01Female
GSM2452121_200483200025_R08C01Male

There should now be two groups with four samples in Male and twelve samples in Female (Figure 4).

Two new columns have been added to spreadsheet 1 (Methylation) with the cell type and gender of each sample. 

For this tutorial, we want to exclude all male samples to simplify our analysis. To do this, we can use the interactive filter. 

For categorical columns, the interactive filter displays each category of the selected column as a colored bar. For 2. Gender, each bar represents a chromosome with the height of bar representing the number of probes from that chromosome in the selected spreadsheet. To filter out a category, left-click on its bar. Right clicking on a bar will include only the selected category. A pop up balloon will show the category label as you mouse over each bar. 

The yellow and black bar on the right-hand side of the spreadsheet panel shows the fraction of excluded rows in black and included rows in yellow. Right-clicking this bar brings up an option to clear the filter. 

Now that we have filtered out male samples, we will create a spreadsheet containing only female samples.