Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Illumina’s MethylationEPIC array interrogates the methylation status of over 850,000 cytosines in the human genome. Because the MethylationEPIC array is closely related to the Infinium HumanMethylation450 BeadChip, the steps presented in this document can be applied to either platform.
This tutorial illustrates how to:
Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on Our support page to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.
The data set accompanying this document consists of sixteen human samples processed by Illumina MethylationEPIC arrays. The data set is taken from a study of DNA methylation in human B cells and B cells infected with Epstein-Barr virus (EBV).
Infecting B cells with EBV in vitro transforms them, making them capable of indefinite growth in vitro. These immortalized cell lines are referred to as lymphoblastoid cell lines (LCLs). LCLs behave similarly to activated B cells, making them useful for expanding T cells in vitro. Because EBV is a carcinogen and immortalized cell growth is a hallmark of cancer, examining the effects of EBV transformation on B cell DNA methylation might shed light on the roles of DNA methylation in tumor development.
The data files can be downloaded from Gene Expression Omnibus using accession number GSE93373 or by selecting this link - Differential Methylation Analysis data set. To follow this tutorial, download the 32 .idat files (note that two .idat files are generated for each array) and unzip them on your local computer using 7-zip, WinRAR, or a similar program.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
To detect differential methylation between CpG loci in different experimental groups, we can perform an ANOVA test. For this tutorial, we will perform a simple two-way ANOVA to compare the methylation states of the two experimental groups.
Select Detect Differential Methylation from the Analysis section of the Illumina BeadArray Methylation workflow
A new child spreadsheet, mvalue, is created when Detect Differential Methylation is selected. M-values are an alternative metric for measuring methylation. β-values can be easily converted to M-values using the following equation: M-value = log2( β / (1 - β)).
An M-value close to 0 for a CpG site indicates a similar intensity between the methylated and unmethylated probes, which means the CpG site is about half-methylated. Positive M-values mean that more molecules are methylated than unmethylated, while negative M-values mean that more molecules are unmethylated than methylated. As discussed by Du and colleagues, the β-value has a more intuitive biological interpretation, but the M-value is more statistically valid for the differential analysis of methylation levels.
Because we are performing differential methylation analysis, Partek Genomics Suite automatically creates an M-values spreadsheet to use for statistical analysis.
Select 2. Cell Type and 3. Gender from the Experimental Factor(s) panel
Select Add Factor > to move 2. Cell Type and 3. Gender to the ANOVA Factor(s) panel (Figure 1)
Figure 1. ANOVA setup dialog. Experimental factors listed on the left can be added to the ANOVA model.
Select Contrasts...
Leave Data is already log transformed? set to No
Leave Report comparisons as set to Difference
For methylation data, fold-change comparisons are not appropriate. Instead, comparisons should be reported as the difference between groups.
Select 2. Cell Type from the Select Factor/Interaction drop-down menu
Select LCLs
Select Add Contrast Level > for the upper group
Select B cells
Select Add Contrast Level > for the lower group
Select Add Contrast (Figure 2)
Figure 2. Configuring ANOVA contrasts
Select OK to close the Configuration dialog
The Contrasts... button of the ANOVA dialog now reads Contrasts Included
Select OK to close the ANOVA dialog and run the ANOVA
If this is the first time you have analyzed a MethylationEPIC array using the Partek Genomics Suite software, the manifest file may need to be configured. If it needs configuration, the Configure Annotation dialog will appear (Figure 3).
Select Chromosome is in one column and the physical location is in another column for Choose the column configuration
Select Ilmn ID for Marker ID
Select CHR for Chromosome i
Select MAPINFO for Physical Position
Select Close
This enables Partek Genomics Suite to parse out probe annotations from the manifest file.
Figure 3. Processing the annotation file. User needs to point to the columns of the annotation file that contain the probe identifier as well as the chromosome and coordinates of the probe.
The results will appear as ANOVA-2way (ANOVAResults), a child spreadsheet of mvalue. Each row of the spreadsheet represents a single CpG locus (identified by Column ID).
Figure 4. ANOVA spreadsheet. Each row is a result of an ANOVA at a given CpG locus (identified by the Column ID column). The remaining columns contain annotation and statistical output
For each contrast, a p-value, Difference, Difference (Description), Beta Difference, and Beta Difference (Description) are generated. The Difference column reports the difference in M-values between the two groups while the Beta Difference column reports the difference in beta values between the two groups.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Principal component analysis (PCA) can be performed to visualize clusters in the methylation data, but also serves as a quality control procedure; outliers within a group could suggest poor data quality, batch effects, mislabeled samples, or uninformative groupings.
Select PCA Scatter Plot from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Scatter Plot tab
Select 2. Cell Type for Color by
Select 3. Gender for Size by
Select () to enable Rotate Mode
Left click and drag to rotate the plot and view different angles (Figure 1)
Each dot of the plot is a single sample and represents the average methylation status across all CpG loci. Two of the LCLs samples do not cluster with the others, but we will not exclude them for this tutorial.
Figure 1. Principal components analysis (PCA) showing methylation profiles of the study samples. Each sample is represented by a dot, the axes are first three PCs, the number in parenthesis indicate the fraction of variance explained by each PC. The number at the top is the variance explained by the first three PCs. The samples are colored by levels of 2. Cell Type
Next, distribution of beta values across the samples can also be inspected by a box-and-whiskers plot.
Select Sample Box and Whiskers Chart from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Box and Whiskers tab
Each box-and-whisker is a sample and the y-axis shows beta-value ranges. Samples in this data set seem reasonably uniform (Figure 2).
Figure 2. Box and whiskers plot showing distribution of M-values (y-axis) across the study samples (x-axis). Samples are colored by a categorical attribute (Cell Type). The middle line is the median, box represents the upper and the lower quartile, while the whiskers correspond to the 90th and 10th percentile of the data
An alternative way to take a look at the distribution of beta-values is a histogram.
Select Sample Histogram from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Histogram tab
Again, no sample in the tutorial data set stands out (Figure 3).
Figure 3. Sample histogram. Each sample is a line, beta values are on the horizontal axis and their frequencies on the vertical axis. Two peaks correspond to two probe types (I and II) present on the MethylationEPIC array. Sample colors correspond to a categorical attribute (Cell Type)
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
To follow this tutorial, download the 32 .idat files (note that two .idat files are generated for each array) and unzip them on your local computer using 7-zip, WinRAR, or a similar program. The .idat files can be downloaded in a zipped folder using this link - Differential Methylation Analysis data set.
Store the 32 .idat files at C:\Partek Training Data\Methylation or to a directory of your choosing. We recommend creating a dedicated folder for the tutorial
Go to the Workflows drop down list, select Methylation (Figure 1)
Figure 1. Selecting the methylation workflow
Select Microarray Loci Methylation from the Methylation sub-workflows panel (Figure 2)
Figure 2. Selecting the Illumina BeadArray Methylation workflow
That will open Illumina BeadArray Methylation workflow (Figure 3)
Figure 3. Illumina BeadArray Methylation workflow
Select Import Illumina Methylation Data to bring up the Load Methylation Data dialog
Select Import human methylation 450/850 .idat files (Figure 4)
Figure 4. Selecting human methylation 450/850 .idat file type for import
Select OK
Select Browse... to navigate to the folder where you stored the .idat files
All .idat files in the folder will be selected by default (Figure 5).
Figure 5. Selecting .idat files to import
Select Add File(s) > to move the files to the idat Files to Process pane of the Import Illumina iDAT Data dialog (Figure 6)
Figure 6. Confirming selection of .idat files for import
Select Next >
The following dialog (Figure 7) deals with the manifest file, i.e. probe annotation file. If a manifest file is not present locally, it will be downloaded in the Microarray libraries directory automatically. The download will take place in the background, with no particular message on the screen and it may take a few minutes, depending on the internet connection. In the future, you may want to reanalyze a data set using the same version of the manifest file used during the initial analysis, rather than downloading an up-to-date version. To facilitate this, the Manual specify option in the Manifest File section allows you to specify a specific version. For this tutorial, we will leave this on the default settings.
Figure 7. Selecting manifest file and output file
By default the output file destination is set to the file containing your .idat files and the name matches the file folder name. The name and location of the output file can be changed using the Output File panel.
Select Customize to view advanced options for data normalization
In the Algorithm tab of the Advanced Import Options dialog (Figure 8), there are two filtering options and five normalization options available. The filters allow you to exclude probes from the X and Y chromosomes or based on detection p-value. In this tutorial, we have male and female samples, so we will apply the X and Y chromosome Filter. We will also filter probes based on detection p-value to exclude low-quality probes.
Select Exclude X and Y Chromosomes
Analysis of differentially methylated loci in humans and mice often excludes probes on the X and Y chromosomes because of the difficulties caused by the inactivation of one X chromosome in female samples.
Select Exclude probes using detection p-value and leave the default settings of 0.05 and 1 out of 16 samples.
Figure 8. Advanced Import Options offers choice of normalization method and additional data outputs
Select OK to close the Advanced Import Options dialog
Select Import on the Import Illumina iDAT data dialog
The imported and normalized data will appear as a spreadsheet 1 (Methylation Tutorial) (Figure 9)
Figure 9. Viewing the imported methylation data in a spreadsheet
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Each row of the spreadsheet (Figure 1) corresponds to a single sample. The first column is the names of the .idat files and the remaining columns are the array probes. The table values are β-values, which correspond to the percentage methylation at each site. A β-value is calculated as the ratio of methylated probe intensity over the overall intensity at each site (the overall intensity is the sum of methylated and unmethylated probe intensities).
Figure 1. Spreadsheet after .idat file import: samples on rows (Sample IDs are based on file names), probes on columns, cell values are functionally normalized beta values (default settings)
Before we can perform any analysis, the study samples need to be organized into their experimental groups.
Select Add Sample Attributes from the Import section of the Illumina BeadArray Methylation workflow
Select Add a Categorical Attribute from the Add Sample Attributes dialog (Figure 2)
Figure 2. Adding sample attributes. Adding Attributes from an Existing Column can be used to split file names into sections, based on delimiters (e.g. _, -, space etc.). Adding a Numeric or Categorical Attribute enables the user to manually specify sample attributes
Select OK
The Create categorical attribute dialog allows us to create groups for a categorical attribute. By default, two groups are created, but additional groups can be added.
Set Attribute name: to Cell Type
Rename the groups B cells and LCLs
Drag and drop the samples from the Unassigned list to their groups as listed in the table below
There should now be two groups with eight samples in each group (Figure 3).
Figure 3. Adding Cell Type attribute as a categorical group
Select OK
Select Yes from the Add another categorical attribute dialog
Set Attribute name: to Gender
Rename the groups Male and Female
Drag and drop the samples from the Unassigned list to their groups as listed in the table below
There should now be two groups with four samples in Male and twelve samples in Female (Figure 4).
Figure 4. Adding Gender attribute as a categorical group
Select OK
Select No from the Add another categorical attribute dialog
Select Yes to save the spreadsheet
Two new columns have been added to spreadsheet 1 (Methylation) with the cell type and gender of each sample (Figure 5).
Figure 5. Annotated beta values spreadsheet
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The significant CpG loci detected in the previous step actually form a methylation signature that differentiates between LCLs and B cells. We can build and visualize this methylation signature using clustering and a heat map.
Select the LCLs_vs_Bcells_CpG_Islands spreadsheet in the spreadsheet pane on the left
Select Cluster Based on Significant Genes from the Visualization panel of the Illumina BeadArray Methylation workflow
Select Hierarchical Clustering for Specify Method (Figure 1)
Figure 1. Selecting Heirarchical Clustering for clustering method
Select OK
Verify that LCLs_vs_Bcells_CpG_Islands is selected in the drop-down menu
Verify that Standardize is selected for Expression normalization (Figure 2)
Figure 2. Selecting spreadsheet and normalization method for clustering
Select OK
The heat map will be displayed on the Hierarchical Clustering tab (Figure 3).
Figure 3. Hierarchical clustering with heat map invoked on a list of significant CpG loci
The experimental groups are rows, while the CpG loci from the LCLs vs B cells spreadsheet are columns. Methylation levels are compared between the LCLs and B cells groups. CpG loci with higher methylation are colored red, CpG loci with lower methylation are colored green. LCLs samples are colored orange and B cells samples are colored red in the dendrogram on the the left-hand side of the heat map.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Genomics Suite enables you to visualize each probe and compare the methylation between the groups at a single CpG site level.
Right click row 5_. SBNO2_ in the LCLs_vs_B_Cells_CpG_Islands spreadsheet
Select Browse to Location from the pop-up menu
Figure 1. Browsing to location from spreadsheet with differentially expressed genes
The Chromosome View tab will open, zoomed in to the selected CpG locus in SBNO2 (Figure 2).
Figure 2. Viewing location in Genome Viewer
The Chromosome View visualization is composed of a series of tracks corresponding to annotation files and data files.
RefSeq Transcripts 2017-05-02 (hg19) (+): transcripts coded by the positive strand
RefSeq Transcripts 2017-05-02 (hg19) (-): transcripts coded by the negative strand
Regions: by default, difference in methylation (M-value) between the groups
Heatmap (1/mvalue): M values for all the samples
Barchart (Methylation): methylation level in M value of the selected sample (to select a sample, click on a heat map)
Heatmap (Methylation Tutorial): Beta values for all the samples
Barchart (Methylation): methylation level in Beta value of the selected sample (to select a sample, click on a heat map)
Cytoband: cytobands of the current chromosome
Genomic Label: coordinates on the current chromosome
To modify a track, select it in the Tracks panel to bring up its configuration options panel below the Tracks panel. Let's modify a few tracks to improve our visualization of the data.
Select the Regions track, opens to Profile tab
Select Color tab
Set Color bars by to Difference (LCLs vs. B cells) (Description)
Select Apply to change
This will color regions by up or down methylated.
Select the Heatmap (1/mvalue)
Select Remove Track
Select Bar Chart (Methylation) located directly below the Regions track
Select Remove Track
We can now more clearly see the Difference in M values for the region in the Regions track, the heatmap of beta values in the Heatmap track, and the beta value for the loci of the selected sample in the Bar Chart track.
Select a sample on the heatmap to view its beta value in the Bar Chart track (Figure 3)
Figure 3. Modify the tracks of the Genome Viewer to facilitate visual analysis
The available tracks can be supplemented with a special annotation file that can be built using a UCSC annotation file as the basis. Building and viewing the UCSC annotation file is available as an optional section of the tutorial, Optional: Add UCSC CpG island annotations.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
To perform gene set and pathway analysis, we need to create a list of genes that overlap with differentially methylated CpG loci.
Select LCLs_vs_B_cells_CpG_Islands in the spreadsheet tree
Select Find Overlapping Genes from the Analysis section of the workflow
The Output Overlapping Features dialog will open (Figure 1). This dialog allows you to choose the annotation database that will define where gene are located. By default the promoter region will be defined as 5000 base pairs upstream and 3000 base pairs downstream from the transcription start site.
Figure 1. Selecting Finding Overlapping Genes form the main toolbar
Select Ensembl Transcripts release 75 from the Report regions from the specified database options
You can select a name for the new list, we have named it gene-list
Select OK
A new spreadsheet will be created as a child spreadsheet (Figure 2)
Figure 2. Annotating the differentially methylated CpG loci with genes
Partek Genomics Suite offers several tools to help interpret this list of genes. First, let's look at Gene Set Analysis.
Select Gene Set Analysis from the Biological Interpretation section of the Illumina BeadArray Methylation workflow
Select GO Enrichment for Select the method of analysis
Select Next >
Select 1/mvalue/lcls_vs_b_cells_cpg_islands/gene-list (gene-list.txt) for the source spreadsheet
Select Next >
Select Invoke gene ontology browser on the result and leave the rest of the options set to defaults for Configure the parameters of the test (Figure 3)
Figure 3. Configuring the parameters of the test
Select Next >
Select Default Mapping File for Select the method of mapping genes to genes sets
Select Next >
A new spreadsheet will be created with categories ranked by enrichment score and the Gene Ontology Browser will launch to graphically display the results of the spreadsheet (Figure 4). The results show which gene sets are over represented in the list of genes overlapped by differentially regulated CpG loci between the experimental and control groups.
Figure 4. GO enrichment browser showing gene groups overrepresented in the list of genes which overlap with differentially methylated CpG loci
To get a better idea whether genes associated with these GO terms have increated or decreased methylation, we can view the Forest Plot.
Select the Forest Plot tab
Go terms are listed by the number of significantly up-regulated genes, with the percent up-regulated and down-regulated shown in red to green bars. Here, we see that most GO terms show increased methylation in their associated genes (Figure 4).
Figure 5. Gene Ontology Forest Plot
Next, we can perform Pathway Analysis to see which pathways are over represented in the gene overlapped by differentially regulated CpG loci.
Select gene-list from the spreadsheet tree
Select Pathway Analysis from the Biological Interpretation section of the Illumina BeadArray Methylation workflow
Select Pathway Enrichment for Select the method of analysis
Select Next >
Select 1/mvalue/lcls_vs_b_cells_cpg_islands/gene-list (gene-list.txt) for the source spreadsheet
Select Next >
Leave the default selections for the Configure parameters of the test panel
Select Next >
Leave the default selections for the Result File and Select the parameters panels
Select Next > to run the analysis
The Pathway-Enrichment spreadsheet will be added to the spreadsheet tree in Partek Genomics Suite and the Partek® Pathway™ software will open to provide visualization of the most significantly enriched pathway as a pathway diagram (Figure 5). The color of the gene boxes reflects p-values of the associated differentially methylated CpG loci (bright orange is insignificant, blue is highly significant). The Color by option can be changed another column from the gene-list.txt spreadsheet, such as Difference.
Figure 6. : Partek Pathway illustrating one of the pathways overrepresented in the list of genes overlapping the differentially methylated CpG sites.
The Pathway-Enrichment spreadsheet can also be viewed in Partek Pathway by switching to the Pathway-Enrichment section of the menu tree on the left-hand side of the window. From the spreadsheet view, you can select a pathway name to visualize that pathway. Alternatively, you can open a pathway visualization in Partek Pathway from the Pathway-Enrichment spreadsheet in Partek Genomics Suite by right-clicking on a row and selecting Show pathway... from the pop-up menu. Please note that if you have closed Partek Pathway and have reopened it, you will need to import a gene list if you want to color the visualization by attributes form the gene list. For more information about using Partek Pathway, checkout our Partek Pathway Tutorial.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Genomics Suite software can view annotation .BED files as tracks in the Genome Viewer. We can add a CpG islands track to the Genome Viewer using the UCSC Genome Browser CpG islands annotation.
Go to
Select Table Browser under Tools in the main command bar of the webpage (Figure 1)
Figure 1. Navigating to the Table Browser at the UCSC Genome Browser website
Configure the Table Browser page as shown (Figure 2)
Figure 2. Configuring the Table Browser to output CpG Islands BED file
Set assembly to Feb. 2009 (GRCh37/hg19)
Set group to Regulation
Set track to CpG Islands
Set table to cpgIslandExt
Set output format to BED
Set output file to cpg.bed
Select get output
The Output cpgIslandExt as BED page will open.
Select get BED to download a compressed folder containing the BED file
Unzip the file using 7-Zip, WinRAR, or a similar program of your choice to a location you will be able to find
Next, we can import the BED file into Partek Genomics Suite.
Select Genomic Database... under Import under File in the main toolbar in Partek Genomics Suite (Figure 3)\
Figure 3. Importing the CpG Islands map BED file
Select the file cpg.bed
The BED file will open as a new spreadsheet.
Change the spreadsheet name to UCSC CpG Island Annotation and save it
The approach described in previous sections relies on ANOVA to detect differentially methylated CpG sites and takes individual sites as a starting point for interpretation. Since ANOVA compares M values at each site independently, this strategy is robust to type I/type II probe bias.
An alternative could be to first summarize all the probes belonging to a CpG island region (i.e. island, N-shore, N-shelf, S-shore, S-shelf) and then use ANOVA to compare regions across the groups. Since the summarization will include both type I and type II probes, you may want to split the analysis in two branches and analyze type I and type II probes independently. To do this, we need to annotate each probe as type I or type II.
Select the mvalue spreadsheet
Select Transform from the main toolbar
Select Create Transposed Spreadsheet... from the Transform drop-down menu (Figure 1)
Figure 1. Creating a transposed spreadsheet
Select Sample ID for Column: and numeric for Data Type:
Select OK
A new temporary spreadsheet will be created with a row for each probe and columns for each sample.
Right-click on column 1. ID to bring up the pop-up menu
Select Insert Annotation
Select Add as categorical
Select Infinium_Design_Type and UCSC_CpG_Islands_Name from the Column Configuration options (Figure 2)
Figure 2. Adding Infinium design type and CpG island annotations
Select OK to add the Inifinium design type and UCSC CpG island name as categorical columns on the spreadsheet
Now, we can use the interactive filter to create separate spreadsheets for type I and type II probes.
Select 2. Infinium_Design_Type from the drop-down menu if not selected by default
Left-click the type I column to exclude it
Right-click the temporary spreadsheet in the spreadsheet tree to bring up the pop-up dialog
Select Clone... (Figure 3)
Figure 3. Creating a probe list with only Infinium type II probes
Name the new spreadsheet female_only_typeII_probes
Select OK
Save the created spreadsheet, we chose the file name female_only_typeII_probes
Repeat process to create a spreadsheet for type I probes
The temporary spreadsheet is no longer needed so we can close it.
We can use these spreadsheets to generate lists of M values at CpG island regions
Select spreadsheet female_only_typeII_probes
Select Stat from the main toolbar
Select Column Statistics... under Descriptive (Figure 4)
Figure 4. Selecting column statistics
Add Mean to the Selected Measure(s) panel
Select Group By and set it to 3. UCSC_CpG_Islands_Name (Figure 5)
Figure 5. Configuring column statistics
Select OK
The new temporary spreadsheet has one CpG island region per row (Figure 6), samples on columns, and the values in the cells represent the mean of M values of all the CpG probes in the region.
Figure 6. New spreadsheet with average M values for probes at each CpG island; probes not at CpG islands are collected into the first row "- Mean"
Note the first row, with label “– Mean”. It corresponds to all the probes that map outside of UCSC CpG islands. As it is not needed for the downstream analysis, we will remove it.
Right-click on the row header for Mean
Select Delete to remove the row
The final step is to transpose the data back to its original orientation.
Select Transform from the main toolbar
Select Create Transposed Spreadsheet... from the Transform drop-down menu
Select 2. Level for Column: and numeric for Data Type:
Select OK
The layout of the new transposed spreadsheet is as follows: one sample per row with CpG island regions on columns; cell entries correspond to mean methylation status of the region (Figure 7). The column with a blank value for the column header is the average of all probes not associated with CpG island regions. You can delete this column if you like.
Figure 7. Spreadsheet with average M values of probes in each CpG island for each sample
Right-click the transposed spreadsheet, 2_transpose
Select Save as... from the pop-up menu
Name it mvalues_typeII_probes_CpG_islands
The mvalues_typeII_probes_CpG_islands spreadsheet can be used as a starting point for ANOVA and other analyses. You can also repeat the steps above to create an equivalent spreadsheet for type I probes.
The list, LCLs vs B cells, includes differentially methylated loci for locations across the genome; however, in many cases we may want to focus on loci located in particular regions of the genome. To filter our list to include only regions of interest, we can use the annotations provided by Illumina and the interactive filter in Partek Genomics Suite.
Select LCLs_Vs_B_cells from the spreadsheet tree
Right-click on the Gene Symbol column
Select Insert Annotation (Figure 1)
Figure 1. Adding an annotation column to the ANOVA results
Select the Add as categorical option
Select Relation_to_UCSC_CpG_Island (Figure 2)
CpG islands are regions of the genome with an atypically high frequency of CpG sites. CpG islands and their surrounding regions (termed shelf and shore) include many gene promoters and altered methylation in these regions can have a disproportionate effect on gene expression. For example, hyper-methylation of promoter CpG islands is a common mechanism for down-regulating gene expression in cancer.
Figure 2. Adding chromosome location to ANOVA results
Select OK to add Relation_to_UCSC_CpG_Island as a column in next to 3. Gene Symbol
Now, we can filter probes by their relation to CpG islands.
Select 4. Relation_to_UCSC_CpG_Island for Column
For categorical columns, the interactive filter displays each category of the selected column as a colored bar. For 4. Relation_to_UCSC_CpG_Island, each bar represents one of the categories of the UCSC annotation . To filter out a category, left-click on its bar. Right clicking on a bar will include only the selected category. A pop up balloon will show the category label as you mouse over each bar.
Right-click the Island column to filter out other columns (Figure 3)
Figure 3. Using Interactive Filter tool to filter out probes by annotation. When pointed to a categorical column, the Interactive Filter tool summarises the content of the column by a column chart. Left-click to exclude a category (two columns were excluded, so they are grayed out), right-click to include only
The yellow and black bar on the right-hand side of the spreadsheet panel shows the fraction of excluded cells in black and included cells in yellow. Right-clicking this bar brings up an option to clear the filter.
Now that we have filtered out probes that are not in CpG islands, we will create a spreadsheet containing only these probes.
Right click on the LCLs vs. B cells spreadsheet in the spreadsheet tree panel (Figure 4)
Figure 4. Cloning a filtered spreadsheet creates a new spreadsheet with only the included cells
Select Clone
Rename the new spreadsheet LCLs_vs_B_cells_CpG_Islands using the Clone Spreadsheet dialog
Select mvalues from the Create new spreadsheet as a child spreadsheet: drop-down menu (Figure 5)
Select OK
Figure 5. Renaming and configuring filtered spreadsheet
Specify a name for the spreadsheet, we chose LCLs_vs_B_cells_CpG_Islands, using the Save File dialog
Select Save to save the spreadsheet
You may want to save the project before proceeding to the next section of the tutorial.
To analyze differences in methylation between our experimental groups, we need to create a list of deferentially methylated loci.
Select Create Marker List from the Analysis section of the Illumina BeadArray Methylation workflow
Select LCLs vs. B cells (Figure 1)
Figure 1. Creating a list of significantly differentially methylated loci
Leave Include size of the change selected and set to Change > 2 OR Change < -2
Leave Include significance of the change selected and set to p-value with FDR < 0.05
Select Create
Select Close to exit the list manager
The new spreadsheet LCLs vs. B cells (LCLs vs. B cells) will open in the Analysis tab.
It is best practice to occasionally save the project you are working on. Let's take the opportunity to do this now.
Select File from the main command toolbar
Select Save Project...
Specify a name for the project, we chose Methylation Tutorial, using the Save File dialog
Select Save to save the project
Saving the project saves the identity and child-parent relationships of all spreadsheets displayed in the spreadsheet tree. This allows us to open all relevant spreadsheets for our analysis by selecting the project file.
We recommend using the default option for normalization; however, advanced users can select their preferred normalization method. Select the () next to each normalization option for details. If you want to import probe intensity, raw probe intensity, probe signals, raw probe signals, or anti-log probe intensity values, they can be added to the data import using the Outputs tab of the Advanced Import Options dialog. Probe intensities and raw probe intensities can be used for advanced troubleshooting purposes and antilog probe intensities can be used for copy number detection. The Outputs tab of the Advanced Import Options dialog also has an option to create NCBI GEO submission spreadsheets from your imported data. For this tutorial, we do not need to import any of these values or create GEO submission spreadsheets.
Sample ID | Cell Type |
---|---|
Sample ID | Gender |
---|---|
The New Track button allows new tracks to be added to the viewer, while the Remove Track button removes the selected track from the viewer. Tracks can be reordered by selecting a track in the Tracks panel and dragging it up or down to move it in the list. In the Chromosome View, select () for selection mode and () for navigation mode. In navigation mode, left-click and draw a box on any track to zoom in. All tracks are synced and will zoom together. Zooming can also be controlled using the interface in the lower right-hand corner of the tab (). View can be reset to the whole chromosome level using reset zoom (). Searching for a gene or transcript in the position box will also zoom directly to its location.
For this region list, you can also calculate the average beta values for the probes in each island per sample and detect differential methylated CpG islands regions. Detailed information on how to get average beta value for each CpG can be found in the Determining the average values for a region list section of .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () to launch the interactive filter
Close the temporary spreadsheet by selecting it in the file tree and selecting ()
Close the source temporary spreadsheet by selecting it in the spreadsheet tree and selecting ()
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () from the quick action bar to save the ANOVA-2way (ANOVA Results) spreadsheet with the added annotation
Select () from the quick action bar to invoke the interactive filter
Select () from the quick action bar to save the filtered spreadsheet
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
GSM2452106_200483200025_R04C01
B cells
GSM2452107_200483200021_R01C01
B cells
GSM2452108_200483200021_R02C01
B cells
GSM2452109_200483200025_R06C01
B cells
GSM2452110_200483200025_R07C01
B cells
GSM2452111_200483200021_R08C01
B cells
GSM2452112_200483200021_R06C01
B cells
GSM2452113_200483200021_R04C01
B cells
GSM2452114_200483200025_R01C01
LCLs
GSM2452115_200483200025_R03C01
LCLs
GSM2452116_200483200021_R03C01
LCLs
GSM2452117_200483200025_R05C01
LCLs
GSM2452118_200483200025_R02C01
LCLs
GSM2452119_200483200021_R07C01
LCLs
GSM2452120_200483200021_R05C01
LCLs
GSM2452121_200483200025_R08C01
LCLs
GSM2452106_200483200025_R04C01
Female
GSM2452107_200483200021_R01C01
Female
GSM2452108_200483200021_R02C01
Male
GSM2452109_200483200025_R06C01
Female
GSM2452110_200483200025_R07C01
Female
GSM2452111_200483200021_R08C01
Female
GSM2452112_200483200021_R06C01
Female
GSM2452113_200483200021_R04C01
Male
GSM2452114_200483200025_R01C01
Female
GSM2452115_200483200025_R03C01
Female
GSM2452116_200483200021_R03C01
Male
GSM2452117_200483200025_R05C01
Female
GSM2452118_200483200025_R02C01
Female
GSM2452119_200483200021_R07C01
Female
GSM2452120_200483200021_R05C01
Female
GSM2452121_200483200025_R08C01
Male
Although the 450K and MethylationEPIC arrays were initially designed to analyze DNA methylation, they are essentially a dense SNP array and can be used for copy number analysis (Feber et al. 2014). The probe intensity data is easily parsed from the idat files by using the Additional Probe Data Spreadsheet Selection dialog (Figure 1) when importing the raw data. Examining the raw intensity data can also be useful for QA/QC purposes.
Follow the steps for importing Illumina methylation data detailed in Import and normalize methylation data until you reach the Import Illumina iDAT Data dialog with Manifest File and Output File panels (Figure 1).
Figure 1. Customizing output during data import
Select Customize... to open the Advanced Import Options dialog
Choose No normalization in the Normalization section of the Algorithm tab
Select the Outputs tab (Figure 2)
Figure 2. Selecting additional probe data to include during data import
Detection p-values. This is the confidence score that the signal of a probe was significantly higher than the background defined by negative control probes. Selecting this checkbox produces a spreadsheet ending with '_detectionp' in addition to the spreadsheet containing beta values. Each row of the _detectionp spreadsheet will be a different sample and the sample names will end in '_detectionp'. This spreadsheet can be used to filter out probes that do not show signal above background.
Probe Intensity. This is the sum of the methlyated and unmethylated intensities per probe. Selecting this checkbox produces a spreadsheet ending with ‘_probe’ in addition to the spreadsheet containing beta values. Each row of the _probe spreadsheet will be a different sample and the file names will also end in ‘_probe.’ The probe intensity values will be log2 transformed by default (note that the beta values are not log2 transformed).
Probe Signal. This option will become available if Probe Intensity is selected. Selecting this checkbox produces a spreadsheet ending with ‘_probe.’ The methylated and unmethylated intensities are shown on separate rows for each sample, in addition to the summed values. The sample names will end in ‘_meth’ or ‘_unmeth’ for methylated and unmethylated values, respectively. The probe intensity values will be log2 transformed by default.
Raw Probe Intensity. This is the sum of the raw red and green signal intensities per probe. Selecting this checkbox produces a spreadsheet ending with ‘_raw’ in addition to the spreadsheet containing beta values. Each row of the spreadsheet will be a different sample and the file names will also end in ‘_raw.’ The raw probe intensity values will be log2 transformed by default.
Raw Probe Signal. This option will become available if Raw Probe Intensity is selected. Selecting this checkbox produces a spreadsheet ending with ‘_raw.’ The red and green intensities will be shown on separate rows for each sample, in addition to the summed values. The sample names will end in ‘_red’ or ‘_green’ for red and green values, respectively. The raw probe intensity values will be log2 transformed by default.
Antilog Probe Intensity Values. Selecting this checkbox will show the probe intensity data without log2 transformation.
Create NCBI GEO Submission Spreadsheets. Generates matrix processed and matrix signal intensities spreadsheets for GEO submission.
How you proceed depends on your study design. Here is an example series of steps to prepare the tutorial data set for copy number analysis:
Select Probe Intensity and Antilog Probe Intensity Values (Figure 2)
Select OK to close the Advanced Import Options dialog
Select Import to import the data and perform the selected normalization method
Select the (_probe) spreadsheet from the spreadsheet tree
Delete any samples with _detectionp names
Create sample attributes and assign samples to the groups as described in Annotate samples
Select Transform from the main toolbar
Select Normalize to baseline
Configure the Normalize to Baseline 1 dialog as shown (Figure 3)
Select Use control set form this spreadsheet
Set Control Category to B cells
Select Ratio to baseline from the Normalization Method section
Select After ratio apply log base 2
Select New Spreadsheet from the Configure Output section
Figure 3. Configuring normalize to baseline
Select OK to generate the spreadsheet
This spreadsheet contains copy number values per probe in log2 space (i.e. diploid = 0). Prior to performing copy number analysis, you can normalize for local GC abundance.
Select Transform
Select Adjust Based on Local GC Content...
Click OK to run Local GC Adjustment (Figure 4)
Figure 4. Adjusting for local GC content
The GC adjusted spreadsheet is the starting spreadsheet for copy number analysis. You can now switch over to the Copy number workflow, skip the Create copy number step, and begin with the Detect amplifications and deletions step. Consult the user's guide for the copy number workflow for subsequent steps.
Feber A, Guilhamon P, Lechner M, et al. Using high-density DNA methylation arrays to profile copy number alterations. Genome Biology. 2014;15(2):R30. doi:10.1186/gb-2014-15-2-r30.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
An Illumina-type project file (.bsc format) can be imported in Illumina’s GenomeStudio® (please note: to process 450K chips, you need GenomeStudio 2010 or newer) and exported using the Partek Methylation Plug-in for GenomeStudio. For more information on the plug-in, please see the . The plug-in creates six files: a Partek project file (*.ppj), an annotation file (*.annotation.txt), files containing intensity values (*.fmt and *.txt), and files containing β-values (*.fmt and *.txt) (Figure 1).
Figure 1. Output of Partek Methylation Plug-in for GenomeStudio
To load all the files automatically, open the .ppj file as follows.
Select Methylation from the Workflows drop-down menu
Select Illumina BeadArray Methylation from the Methylation sub-workflows section
Select Import Illumina Methylation Data from the Import section
Select Load a project following Illumina GenomeStudio export from the Load Methylation Data dialog
Information about the different output options can be found by selecting the adjacent () icon.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.