1 of 14

Differential Methylation Analysis

Illumina’s MethylationEPIC array interrogates the methylation status of over 850,000 cytosines in the human genome. Because the MethylationEPIC array is closely related to the Infinium HumanMethylation450 BeadChip, the steps presented in this document can be applied to either platform.

This tutorial illustrates how to:

Import and normalize methylation data
Annotate samples

Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

Description of the Data Set

The data set accompanying this document consists of sixteen human samples processed by Illumina MethylationEPIC arrays. The data set is taken from a study of DNA methylation in human B cells and B cells infected with Epstein-Barr virus (EBV).

Infecting B cells with EBV in vitro transforms them, making them capable of indefinite growth in vitro. These immortalized cell lines are referred to as lymphoblastoid cell lines (LCLs). LCLs behave similarly to activated B cells, making them useful for expanding T cells in vitro. Because EBV is a carcinogen and immortalized cell growth is a hallmark of cancer, examining the effects of EBV transformation on B cell DNA methylation might shed light on the roles of DNA methylation in tumor development.

The data files can be downloaded from Gene Expression Omnibus using accession number or by selecting this link - . To follow this tutorial, download the 32 .idat files (note that two .idat files are generated for each array) and unzip them on your local computer using 7-zip, WinRAR, or a similar program.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Import and normalize methylation data

To follow this tutorial, download the 32 .idat files (note that two .idat files are generated for each array) and unzip them on your local computer using 7-zip, WinRAR, or a similar program. The .idat files can be downloaded in a zipped folder using this link - Differential Methylation Analysis data set.

Store the 32 .idat files at C:\Partek Training Data\Methylation or to a directory of your choosing. We recommend creating a dedicated folder for the tutorial
Go to the Workflows drop down list, select Methylation (Figure 1)

Figure 1. Selecting the methylation workflow

Select Microarray Loci Methylation from the Methylation sub-workflows panel (Figure 2)

Figure 2. Selecting the Illumina BeadArray Methylation workflow

That will open Illumina BeadArray Methylation workflow (Figure 3)

Figure 3. Illumina BeadArray Methylation workflow

Select Import Illumina Methylation Data to bring up the Load Methylation Data dialog
Select Import human methylation 450/850 .idat files (Figure 4)

Figure 4. Selecting human methylation 450/850 .idat file type for import

Select OK
Select Browse... to navigate to the folder where you stored the .idat files

All .idat files in the folder will be selected by default (Figure 5).

Figure 5. Selecting .idat files to import

Select Add File(s) > to move the files to the idat Files to Process pane of the Import Illumina iDAT Data dialog (Figure 6)

Figure 6. Confirming selection of .idat files for import

Select Next >

The following dialog (Figure 7) deals with the manifest file, i.e. probe annotation file. If a manifest file is not present locally, it will be downloaded in the Microarray libraries directory automatically. The download will take place in the background, with no particular message on the screen and it may take a few minutes, depending on the internet connection. In the future, you may want to reanalyze a data set using the same version of the manifest file used during the initial analysis, rather than downloading an up-to-date version. To facilitate this, the Manual specify option in the Manifest File section allows you to specify a specific version. For this tutorial, we will leave this on the default settings.

Figure 7. Selecting manifest file and output file

By default the output file destination is set to the file containing your .idat files and the name matches the file folder name. The name and location of the output file can be changed using the Output File panel.

Select Customize to view advanced options for data normalization

In the Algorithm tab of the Advanced Import Options dialog (Figure 8), there are two filtering options and five normalization options available. The filters allow you to exclude probes from the X and Y chromosomes or based on detection p-value. In this tutorial, we have male and female samples, so we will apply the X and Y chromosome Filter. We will also filter probes based on detection p-value to exclude low-quality probes.

Select Exclude X and Y Chromosomes

Analysis of differentially methylated loci in humans and mice often excludes probes on the X and Y chromosomes because of the difficulties caused by the inactivation of one X chromosome in female samples.

Select Exclude probes using detection p-value and leave the default settings of 0.05 and 1 out of 16 samples.

We recommend using the default option for normalization; however, advanced users can select their preferred normalization method. Select the () next to each normalization option for details. If you want to import probe intensity, raw probe intensity, probe signals, raw probe signals, or anti-log probe intensity values, they can be added to the data import using the Outputs tab of the Advanced Import Options dialog. Probe intensities and raw probe intensities can be used for advanced troubleshooting purposes and antilog probe intensities can be used for copy number detection. The Outputs tab of the Advanced Import Options dialog also has an option to create NCBI GEO submission spreadsheets from your imported data. For this tutorial, we do not need to import any of these values or create GEO submission spreadsheets.

Figure 8. Advanced Import Options offers choice of normalization method and additional data outputs

Select OK to close the Advanced Import Options dialog
Select Import on the Import Illumina iDAT data dialog

The imported and normalized data will appear as a spreadsheet 1 (Methylation Tutorial) (Figure 9)

Figure 9. Viewing the imported methylation data in a spreadsheet

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Annotate samples

Each row of the spreadsheet (Figure 1) corresponds to a single sample. The first column is the names of the .idat files and the remaining columns are the array probes. The table values are β-values, which correspond to the percentage methylation at each site. A β-value is calculated as the ratio of methylated probe intensity over the overall intensity at each site (the overall intensity is the sum of methylated and unmethylated probe intensities).

Figure 1. Spreadsheet after .idat file import: samples on rows (Sample IDs are based on file names), probes on columns, cell values are functionally normalized beta values (default settings)

Before we can perform any analysis, the study samples need to be organized into their experimental groups.

Select Add Sample Attributes

Perform data quality analysis and quality control

Principal component analysis (PCA) can be performed to visualize clusters in the methylation data, but also serves as a quality control procedure; outliers within a group could suggest poor data quality, batch effects, mislabeled samples, or uninformative groupings.

Select PCA Scatter Plot from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Scatter Plot tab
Select 2. Cell Type for Color by

Detect differentially methylated loci

To detect differential methylation between CpG loci in different experimental groups, we can perform an ANOVA test. For this tutorial, we will perform a simple two-way ANOVA to compare the methylation states of the two experimental groups.

Select Detect Differential Methylation from the Analysis section of the Illumina BeadArray Methylation workflow

A new child spreadsheet, mvalue, is created when Detect Differential Methylation is selected. M-values are an alternative metric for measuring methylation. β-values can be easily converted to M-values using the following equation: M-value = log2( β / (1 - β)).

An M-value close to 0 for a CpG site indicates a similar intensity between the methylated and unmethylated probes, which means the CpG site is about half-methylated. Positive M-values mean that more molecules are methylated than unmethylated, while negative M-values mean that more molecules are unmethylated than methylated. As discussed by

Create a marker list

To analyze differences in methylation between our experimental groups, we need to create a list of deferentially methylated loci.

Select Create Marker List from the Analysis section of the Illumina BeadArray Methylation workflow
Select LCLs vs. B cells (Figure 1)

Figure 1. Creating a list of significantly differentially methylated loci

Filter loci with the interactive filter

The list, LCLs vs B cells, includes differentially methylated loci for locations across the genome; however, in many cases we may want to focus on loci located in particular regions of the genome. To filter our list to include only regions of interest, we can use the annotations provided by Illumina and the interactive filter in Partek Genomics Suite.

Select LCLs_Vs_B_cells from the spreadsheet tree
Right-click on the Gene Symbol column

Obtain methylation signatures

The significant CpG loci detected in the previous step actually form a methylation signature that differentiates between LCLs and B cells. We can build and visualize this methylation signature using clustering and a heat map.

Select the LCLs_vs_Bcells_CpG_Islands spreadsheet in the spreadsheet pane on the left
Select Cluster Based on Significant Genes from the Visualization panel of the Illumina BeadArray Methylation workflow
Select Hierarchical Clustering for Specify Method (Figure 1)

Figure 1. Selecting Heirarchical Clustering for clustering method

Select OK
Verify that LCLs_vs_Bcells_CpG_Islands is selected in the drop-down menu
Verify that Standardize is selected for Expression normalization (Figure 2)

Figure 2. Selecting spreadsheet and normalization method for clustering

Select OK

The heat map will be displayed on the Hierarchical Clustering tab (Figure 3).

Figure 3. Hierarchical clustering with heat map invoked on a list of significant CpG loci

The experimental groups are rows, while the CpG loci from the LCLs vs B cells spreadsheet are columns. Methylation levels are compared between the LCLs and B cells groups. CpG loci with higher methylation are colored red, CpG loci with lower methylation are colored green. LCLs samples are colored orange and B cells samples are colored red in the dendrogram on the the left-hand side of the heat map.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Visualize methylation at each locus

Partek Genomics Suite enables you to visualize each probe and compare the methylation between the groups at a single CpG site level.

Right click row 5_. SBNO2_ in the LCLs_vs_B_Cells_CpG_Islands spreadsheet
Select Browse to Location from the pop-up menu

Perform gene set and pathway analysis

To perform gene set and pathway analysis, we need to create a list of genes that overlap with differentially methylated CpG loci.

Select LCLs_vs_B_cells_CpG_Islands in the spreadsheet tree
Select Find Overlapping Genes from the Analysis section of the workflow

The Output Overlapping Features dialog will open (Figure 1). This dialog allows you to choose the annotation database that will define where gene are located. By default the promoter region will be defined as 5000 base pairs upstream and 3000 base pairs downstream from the transcription start site.

Detect differentially methylated CpG islands

The approach described in previous sections relies on ANOVA to detect differentially methylated CpG sites and takes individual sites as a starting point for interpretation. Since ANOVA compares M values at each site independently, this strategy is robust to type I/type II probe bias.

An alternative could be to first summarize all the probes belonging to a CpG island region (i.e. island, N-shore, N-shelf, S-shore, S-shelf) and then use ANOVA to compare regions across the groups. Since the summarization will include both type I and type II probes, you may want to split the analysis in two branches and analyze type I and type II probes independently. To do this, we need to annotate each probe as type I or type II.

Select the mvalue spreadsheet
Select Transform from the main toolbar
Select Create Transposed Spreadsheet... from the Transform drop-down menu (Figure 1)

Figure 1. Creating a transposed spreadsheet

Select Sample ID for Column: and numeric for Data Type:
Select OK

A new temporary spreadsheet will be created with a row for each probe and columns for each sample.

Right-click on column 1. ID to bring up the pop-up menu
Select Insert Annotation
Select Add as categorical

Figure 2. Adding Infinium design type and CpG island annotations

Select OK to add the Inifinium design type and UCSC CpG island name as categorical columns on the spreadsheet

Now, we can use the interactive filter to create separate spreadsheets for type I and type II probes.

Select () to launch the interactive filter
Select 2. Infinium_Design_Type from the drop-down menu if not selected by default
Left-click the type I column to exclude it

Figure 3. Creating a probe list with only Infinium type II probes

Name the new spreadsheet female_only_typeII_probes
Select OK
Save the created spreadsheet, we chose the file name female_only_typeII_probes

The temporary spreadsheet is no longer needed so we can close it.

Close the temporary spreadsheet by selecting it in the file tree and selecting ()

We can use these spreadsheets to generate lists of M values at CpG island regions

Select spreadsheet female_only_typeII_probes
Select Stat from the main toolbar
Select Column Statistics... under Descriptive (Figure 4)

Figure 4. Selecting column statistics

Add Mean to the Selected Measure(s) panel
Select Group By and set it to 3. UCSC_CpG_Islands_Name (Figure 5)

Figure 5. Configuring column statistics

Select OK

The new temporary spreadsheet has one CpG island region per row (Figure 6), samples on columns, and the values in the cells represent the mean of M values of all the CpG probes in the region.

Figure 6. New spreadsheet with average M values for probes at each CpG island; probes not at CpG islands are collected into the first row "- Mean"

Note the first row, with label “– Mean”. It corresponds to all the probes that map outside of UCSC CpG islands. As it is not needed for the downstream analysis, we will remove it.

Right-click on the row header for Mean
Select Delete to remove the row

The final step is to transpose the data back to its original orientation.

Select Transform from the main toolbar
Select Create Transposed Spreadsheet... from the Transform drop-down menu
Select 2. Level for Column: and numeric for Data Type:

The layout of the new transposed spreadsheet is as follows: one sample per row with CpG island regions on columns; cell entries correspond to mean methylation status of the region (Figure 7). The column with a blank value for the column header is the average of all probes not associated with CpG island regions. You can delete this column if you like.

Figure 7. Spreadsheet with average M values of probes in each CpG island for each sample

Right-click the transposed spreadsheet, 2_transpose
Select Save as... from the pop-up menu
Name it mvalues_typeII_probes_CpG_islands

The mvalues_typeII_probes_CpG_islands spreadsheet can be used as a starting point for ANOVA and other analyses. You can also repeat the steps above to create an equivalent spreadsheet for type I probes.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Optional: Add UCSC CpG island annotations

Partek Genomics Suite software can view annotation .BED files as tracks in the Genome Viewer. We can add a CpG islands track to the Genome Viewer using the UCSC Genome Browser CpG islands annotation.

Go to UCSC Genome Browser
Select Table Browser under Tools in the main command bar of the webpage (Figure 1)

Figure 1. Navigating to the Table Browser at the UCSC Genome Browser website

Configure the Table Browser page as shown (Figure 2)

Figure 2. Configuring the Table Browser to output CpG Islands BED file

Set assembly to Feb. 2009 (GRCh37/hg19)
Set group to Regulation
Set track to CpG Islands

The Output cpgIslandExt as BED page will open.

Select get BED to download a compressed folder containing the BED file
Unzip the file using 7-Zip, WinRAR, or a similar program of your choice to a location you will be able to find

Next, we can import the BED file into Partek Genomics Suite.

Select Genomic Database... under Import under File in the main toolbar in Partek Genomics Suite (Figure 3)\

Figure 3. Importing the CpG Islands map BED file

Select the file cpg.bed

The BED file will open as a new spreadsheet.

Change the spreadsheet name to UCSC CpG Island Annotation and save it

For this region list, you can also calculate the average beta values for the probes in each island per sample and detect differential methylated CpG islands regions. Detailed information on how to get average beta value for each CpG can be found in the Determining the average values for a region list section of .

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Optional: Use MethylationEPIC for CNV analysis

Although the 450K and MethylationEPIC arrays were initially designed to analyze DNA methylation, they are essentially a dense SNP array and can be used for copy number analysis (Feber et al. 2014). The probe intensity data is easily parsed from the idat files by using the Additional Probe Data Spreadsheet Selection dialog (Figure 1) when importing the raw data. Examining the raw intensity data can also be useful for QA/QC purposes.

Follow the steps for importing Illumina methylation data detailed in Import and normalize methylation data until you reach the Import Illumina iDAT Data dialog with Manifest File and Output File panels (Figure 1).

Figure 1. Customizing output during data import

Select Customize... to open the Advanced Import Options dialog
Choose No normalization in the Normalization section of the Algorithm tab
Select the Outputs tab (Figure 2)

Figure 2. Selecting additional probe data to include during data import

Information about the different output options can be found by selecting the adjacent () icon.

Detection p-values. This is the confidence score that the signal of a probe was significantly higher than the background defined by negative control probes. Selecting this checkbox produces a spreadsheet ending with '_detectionp' in addition to the spreadsheet containing beta values. Each row of the _detectionp spreadsheet will be a different sample and the sample names will end in '_detectionp'. This spreadsheet can be used to filter out probes that do not show signal above background.

Probe Intensity. This is the sum of the methlyated and unmethylated intensities per probe. Selecting this checkbox produces a spreadsheet ending with ‘_probe’ in addition to the spreadsheet containing beta values. Each row of the _probe spreadsheet will be a different sample and the file names will also end in ‘_probe.’ The probe intensity values will be log2 transformed by default (note that the beta values are not log2 transformed).

Probe Signal. This option will become available if Probe Intensity is selected. Selecting this checkbox produces a spreadsheet ending with ‘_probe.’ The methylated and unmethylated intensities are shown on separate rows for each sample, in addition to the summed values. The sample names will end in ‘_meth’ or ‘_unmeth’ for methylated and unmethylated values, respectively. The probe intensity values will be log2 transformed by default.

Raw Probe Intensity. This is the sum of the raw red and green signal intensities per probe. Selecting this checkbox produces a spreadsheet ending with ‘_raw’ in addition to the spreadsheet containing beta values. Each row of the spreadsheet will be a different sample and the file names will also end in ‘_raw.’ The raw probe intensity values will be log2 transformed by default.

Raw Probe Signal. This option will become available if Raw Probe Intensity is selected. Selecting this checkbox produces a spreadsheet ending with ‘_raw.’ The red and green intensities will be shown on separate rows for each sample, in addition to the summed values. The sample names will end in ‘_red’ or ‘_green’ for red and green values, respectively. The raw probe intensity values will be log2 transformed by default.

Antilog Probe Intensity Values. Selecting this checkbox will show the probe intensity data without log2 transformation.

Create NCBI GEO Submission Spreadsheets. Generates matrix processed and matrix signal intensities spreadsheets for GEO submission.

How you proceed depends on your study design. Here is an example series of steps to prepare the tutorial data set for copy number analysis:

Select Probe Intensity and Antilog Probe Intensity Values (Figure 2)
Select OK to close the Advanced Import Options dialog
Select Import to import the data and perform the selected normalization method

Configure the Normalize to Baseline 1 dialog as shown (Figure 3)

Select Use control set form this spreadsheet
Set Control Category to B cells
Select Ratio to baseline from the Normalization Method section

Figure 3. Configuring normalize to baseline

Select OK to generate the spreadsheet

This spreadsheet contains copy number values per probe in log2 space (i.e. diploid = 0). Prior to performing copy number analysis, you can normalize for local GC abundance.

Select Transform
Select Adjust Based on Local GC Content...
Click OK to run Local GC Adjustment (Figure 4)

Figure 4. Adjusting for local GC content

The GC adjusted spreadsheet is the starting spreadsheet for copy number analysis. You can now switch over to the Copy number workflow, skip the Create copy number step, and begin with the Detect amplifications and deletions step. Consult the user's guide for the copy number workflow for subsequent steps.

References

Feber A, Guilhamon P, Lechner M, et al. Using high-density DNA methylation arrays to profile copy number alterations. Genome Biology. 2014;15(2):R30. doi:10.1186/gb-2014-15-2-r30.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Optional: Import a Partek Project from Genome Studio

An Illumina-type project file (.bsc format) can be imported in Illumina’s GenomeStudio® (please note: to process 450K chips, you need GenomeStudio 2010 or newer) and exported using the Partek Methylation Plug-in for GenomeStudio. For more information on the plug-in, please see the . The plug-in creates six files: a Partek project file (*.ppj), an annotation file (*.annotation.txt), files containing intensity values (*.fmt and *.txt), and files containing β-values (*.fmt and *.txt) (Figure 1).

Figure 1. Output of Partek Methylation Plug-in for GenomeStudio

To load all the files automatically, open the .ppj file as follows.