arrow-left

All pages
gitbookPowered by GitBook
1 of 14

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Import and normalize methylation data

To follow this tutorial, download the 32 .idat files (note that two .idat files are generated for each array) and unzip them on your local computer using 7-zip, WinRAR, or a similar program. The .idat files can be downloaded in a zipped folder using this link - Differential Methylation Analysis data setarrow-up-right.

  • Store the 32 .idat files at C:\Partek Training Data\Methylation or to a directory of your choosing. We recommend creating a dedicated folder for the tutorial

  • Go to the Workflows drop down list, select Methylation (Figure 1)

Figure 1. Selecting the methylation workflow

  • Select Microarray Loci Methylation from the Methylation sub-workflows panel (Figure 2)

Figure 2. Selecting the Illumina BeadArray Methylation workflow

  • That will open Illumina BeadArray Methylation workflow (Figure 3)

Figure 3. Illumina BeadArray Methylation workflow

  • Select Import Illumina Methylation Data to bring up the Load Methylation Data dialog

  • Select Import human methylation 450/850 .idat files (Figure 4)

Figure 4. Selecting human methylation 450/850 .idat file type for import

  • Select OK

  • Select Browse... to navigate to the folder where you stored the .idat files

All .idat files in the folder will be selected by default (Figure 5).

Figure 5. Selecting .idat files to import

  • Select Add File(s) > to move the files to the idat Files to Process pane of the Import Illumina iDAT Data dialog (Figure 6)

Figure 6. Confirming selection of .idat files for import

  • Select Next >

The following dialog (Figure 7) deals with the manifest file, i.e. probe annotation file. If a manifest file is not present locally, it will be downloaded in the Microarray libraries directory automatically. The download will take place in the background, with no particular message on the screen and it may take a few minutes, depending on the internet connection. In the future, you may want to reanalyze a data set using the same version of the manifest file used during the initial analysis, rather than downloading an up-to-date version. To facilitate this, the Manual specify option in the Manifest File section allows you to specify a specific version. For this tutorial, we will leave this on the default settings.

Figure 7. Selecting manifest file and output file

By default the output file destination is set to the file containing your .idat files and the name matches the file folder name. The name and location of the output file can be changed using the Output File panel.

  • Select Customize to view advanced options for data normalization

In the Algorithm tab of the Advanced Import Options dialog (Figure 8), there are two filtering options and five normalization options available. The filters allow you to exclude probes from the X and Y chromosomes or based on detection p-value. In this tutorial, we have male and female samples, so we will apply the X and Y chromosome Filter. We will also filter probes based on detection p-value to exclude low-quality probes.

  • Select Exclude X and Y Chromosomes

Analysis of differentially methylated loci in humans and mice often excludes probes on the X and Y chromosomes because of the difficulties caused by the inactivation of one X chromosome in female samples.

  • Select Exclude probes using detection p-value and leave the default settings of 0.05 and 1 out of 16 samples.

We recommend using the default option for normalization; however, advanced users can select their preferred normalization method. Select the () next to each normalization option for details. If you want to import probe intensity, raw probe intensity, probe signals, raw probe signals, or anti-log probe intensity values, they can be added to the data import using the Outputs tab of the Advanced Import Options dialog. Probe intensities and raw probe intensities can be used for advanced troubleshooting purposes and antilog probe intensities can be used for copy number detection. The Outputs tab of the Advanced Import Options dialog also has an option to create NCBI GEO submission spreadsheets from your imported data. For this tutorial, we do not need to import any of these values or create GEO submission spreadsheets.

Figure 8. Advanced Import Options offers choice of normalization method and additional data outputs

  • Select OK to close the Advanced Import Options dialog

  • Select Import on the Import Illumina iDAT data dialog

The imported and normalized data will appear as a spreadsheet 1 (Methylation Tutorial) (Figure 9)

Figure 9. Viewing the imported methylation data in a spreadsheet

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

our support pagearrow-up-right

Differential Methylation Analysis

Illumina’s MethylationEPIC array interrogates the methylation status of over 850,000 cytosines in the human genome. Because the MethylationEPIC array is closely related to the Infinium HumanMethylation450 BeadChip, the steps presented in this document can be applied to either platform.

This tutorial illustrates how to:

  • Import and normalize methylation data

  • Annotate samples

Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

hashtag
Description of the Data Set

The data set accompanying this document consists of sixteen human samples processed by Illumina MethylationEPIC arrays. The data set is taken from a study of DNA methylation in human B cells and B cells infected with Epstein-Barr virus (EBV).

Infecting B cells with EBV in vitro transforms them, making them capable of indefinite growth in vitro. These immortalized cell lines are referred to as lymphoblastoid cell lines (LCLs). LCLs behave similarly to activated B cells, making them useful for expanding T cells in vitro. Because EBV is a carcinogen and immortalized cell growth is a hallmark of cancer, examining the effects of EBV transformation on B cell DNA methylation might shed light on the roles of DNA methylation in tumor development.

The data files can be downloaded from Gene Expression Omnibus using accession number or by selecting this link - . To follow this tutorial, download the 32 .idat files (note that two .idat files are generated for each array) and unzip them on your local computer using 7-zip, WinRAR, or a similar program.

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Perform data quality analysis and quality control

Principal component analysis (PCA) can be performed to visualize clusters in the methylation data, but also serves as a quality control procedure; outliers within a group could suggest poor data quality, batch effects, mislabeled samples, or uninformative groupings.

  • Select PCA Scatter Plot from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Scatter Plot tab

  • Select 2. Cell Type for Color by

Create a marker list

To analyze differences in methylation between our experimental groups, we need to create a list of deferentially methylated loci.

  • Select Create Marker List from the Analysis section of the Illumina BeadArray Methylation workflow

  • Select LCLs vs. B cells (Figure 1)

Figure 1. Creating a list of significantly differentially methylated loci

Optional: Import a Partek Project from Genome Studio

An Illumina-type project file (.bsc format) can be imported in Illumina’s GenomeStudio® (please note: to process 450K chips, you need GenomeStudio 2010 or newer) and exported using the Partek Methylation Plug-in for GenomeStudio. For more information on the plug-in, please see the . The plug-in creates six files: a Partek project file (*.ppj), an annotation file (*.annotation.txt), files containing intensity values (*.fmt and *.txt), and files containing β-values (*.fmt and *.txt) (Figure 1).

Figure 1. Output of Partek Methylation Plug-in for GenomeStudio

To load all the files automatically, open the .ppj file as follows.

  • Select Methylation from the Workflows

Perform data quality analysis and quality control
Detect differentially methylated loci
Create a marker list
Filter loci with the interactive filter
Obtain methylation signatures
Visualize methylation at each locus
Perform gene set and pathway analysis
Detect differentially methylated CpG islands
Optional: Add UCSC CpG island annotations
Optional: Use MethylationEPIC for CNV analysis
Optional: Import a Partek Project from Genome Studio
Our support pagearrow-up-right
GSE93373arrow-up-right
Differential Methylation Analysis data setarrow-up-right
our support pagearrow-up-right
  • Leave Include size of the change selected and set to Change > 2 OR Change < -2

  • Leave Include significance of the change selected and set to p-value with FDR < 0.05

  • Select Create

  • Select Close to exit the list manager

The new spreadsheet LCLs vs. B cells (LCLs vs. B cells) will open in the Analysis tab.

It is best practice to occasionally save the project you are working on. Let's take the opportunity to do this now.

  • Select File from the main command toolbar

  • Select Save Project...

  • Specify a name for the project, we chose Methylation Tutorial, using the Save File dialog

  • Select Save to save the project

Saving the project saves the identity and child-parent relationships of all spreadsheets displayed in the spreadsheet tree. This allows us to open all relevant spreadsheets for our analysis by selecting the project file.

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

drop-down menu
  • Select Illumina BeadArray Methylation from the Methylation sub-workflows section

  • Select Import Illumina Methylation Data from the Import section

  • Select Load a project following Illumina GenomeStudio export from the Load Methylation Data dialog

  • hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    plug-in user guide

    Select 3. Gender for Size by

  • Select () to enable Rotate Mode

  • Left click and drag to rotate the plot and view different angles (Figure 1)

  • Each dot of the plot is a single sample and represents the average methylation status across all CpG loci. Two of the LCLs samples do not cluster with the others, but we will not exclude them for this tutorial.

    Figure 1. Principal components analysis (PCA) showing methylation profiles of the study samples. Each sample is represented by a dot, the axes are first three PCs, the number in parenthesis indicate the fraction of variance explained by each PC. The number at the top is the variance explained by the first three PCs. The samples are colored by levels of 2. Cell Type

    Next, distribution of beta values across the samples can also be inspected by a box-and-whiskers plot.

    • Select Sample Box and Whiskers Chart from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Box and Whiskers tab

    Each box-and-whisker is a sample and the y-axis shows beta-value ranges. Samples in this data set seem reasonably uniform (Figure 2).

    Figure 2. Box and whiskers plot showing distribution of M-values (y-axis) across the study samples (x-axis). Samples are colored by a categorical attribute (Cell Type). The middle line is the median, box represents the upper and the lower quartile, while the whiskers correspond to the 90th and 10th percentile of the data

    An alternative way to take a look at the distribution of beta-values is a histogram.

    • Select Sample Histogram from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Histogram tab

    Again, no sample in the tutorial data set stands out (Figure 3).

    Figure 3. Sample histogram. Each sample is a line, beta values are on the horizontal axis and their frequencies on the vertical axis. Two peaks correspond to two probe types (I and II) present on the MethylationEPIC array. Sample colors correspond to a categorical attribute (Cell Type)

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Obtain methylation signatures

    The significant CpG loci detected in the previous step actually form a methylation signature that differentiates between LCLs and B cells. We can build and visualize this methylation signature using clustering and a heat map.

    • Select the LCLs_vs_Bcells_CpG_Islands spreadsheet in the spreadsheet pane on the left

    • Select Cluster Based on Significant Genes from the Visualization panel of the Illumina BeadArray Methylation workflow

    • Select Hierarchical Clustering for Specify Method (Figure 1)

    Figure 1. Selecting Heirarchical Clustering for clustering method

    • Select OK

    • Verify that LCLs_vs_Bcells_CpG_Islands is selected in the drop-down menu

    • Verify that Standardize is selected for Expression normalization (Figure 2)

    Figure 2. Selecting spreadsheet and normalization method for clustering

    • Select OK

    The heat map will be displayed on the Hierarchical Clustering tab (Figure 3).

    Figure 3. Hierarchical clustering with heat map invoked on a list of significant CpG loci

    The experimental groups are rows, while the CpG loci from the LCLs vs B cells spreadsheet are columns. Methylation levels are compared between the LCLs and B cells groups. CpG loci with higher methylation are colored red, CpG loci with lower methylation are colored green. LCLs samples are colored orange and B cells samples are colored red in the dendrogram on the the left-hand side of the heat map.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Optional: Add UCSC CpG island annotations

    Partek Genomics Suite software can view annotation .BED files as tracks in the Genome Viewer. We can add a CpG islands track to the Genome Viewer using the UCSC Genome Browser CpG islands annotation.

    • Go to UCSC Genome Browserarrow-up-right

    • Select Table Browser under Tools in the main command bar of the webpage (Figure 1)

    Figure 1. Navigating to the Table Browser at the UCSC Genome Browser website

    • Configure the Table Browser page as shown (Figure 2)

    Figure 2. Configuring the Table Browser to output CpG Islands BED file

    • Set assembly to Feb. 2009 (GRCh37/hg19)

    • Set group to Regulation

    • Set track to CpG Islands

    The Output cpgIslandExt as BED page will open.

    • Select get BED to download a compressed folder containing the BED file

    • Unzip the file using 7-Zip, WinRAR, or a similar program of your choice to a location you will be able to find

    Next, we can import the BED file into Partek Genomics Suite.

    • Select Genomic Database... under Import under File in the main toolbar in Partek Genomics Suite (Figure 3)\

    Figure 3. Importing the CpG Islands map BED file

    • Select the file cpg.bed

    The BED file will open as a new spreadsheet.

    • Change the spreadsheet name to UCSC CpG Island Annotation and save it

    For this region list, you can also calculate the average beta values for the probes in each island per sample and detect differential methylated CpG islands regions. Detailed information on how to get average beta value for each CpG can be found in the Determining the average values for a region list section of .

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Detect differentially methylated loci

    To detect differential methylation between CpG loci in different experimental groups, we can perform an ANOVA test. For this tutorial, we will perform a simple two-way ANOVA to compare the methylation states of the two experimental groups.

    • Select Detect Differential Methylation from the Analysis section of the Illumina BeadArray Methylation workflow

    A new child spreadsheet, mvalue, is created when Detect Differential Methylation is selected. M-values are an alternative metric for measuring methylation. β-values can be easily converted to M-values using the following equation: M-value = log2( β / (1 - β)).

    An M-value close to 0 for a CpG site indicates a similar intensity between the methylated and unmethylated probes, which means the CpG site is about half-methylated. Positive M-values mean that more molecules are methylated than unmethylated, while negative M-values mean that more molecules are unmethylated than methylated. As discussed by

    Visualize methylation at each locus

    Partek Genomics Suite enables you to visualize each probe and compare the methylation between the groups at a single CpG site level.

    • Right click row 5_. SBNO2_ in the LCLs_vs_B_Cells_CpG_Islands spreadsheet

    • Select Browse to Location from the pop-up menu

    our support pagearrow-up-right
    Set table to cpgIslandExt
  • Set output format to BED

  • Set output file to cpg.bed

  • Select get output

  • Starting with a list of genomic regions
    our support pagearrow-up-right
    , the β-value has a more intuitive biological interpretation, but the M-value is more statistically valid for the differential analysis of methylation levels.

    Because we are performing differential methylation analysis, Partek Genomics Suite automatically creates an M-values spreadsheet to use for statistical analysis.

    • Select 2. Cell Type and 3. Gender from the Experimental Factor(s) panel

    • Select Add Factor > to move 2. Cell Type and 3. Gender to the ANOVA Factor(s) panel (Figure 1)

    Figure 1. ANOVA setup dialog. Experimental factors listed on the left can be added to the ANOVA model.

    • Select Contrasts...

    • Leave Data is already log transformed? set to No

    • Leave Report comparisons as set to Difference

    For methylation data, fold-change comparisons are not appropriate. Instead, comparisons should be reported as the difference between groups.

    • Select 2. Cell Type from the Select Factor/Interaction drop-down menu

    • Select LCLs

    • Select Add Contrast Level > for the upper group

    • Select B cells

    • Select Add Contrast Level > for the lower group

    • Select Add Contrast (Figure 2)

    Figure 2. Configuring ANOVA contrasts

    • Select OK to close the Configuration dialog

    The Contrasts... button of the ANOVA dialog now reads Contrasts Included

    • Select OK to close the ANOVA dialog and run the ANOVA

    If this is the first time you have analyzed a MethylationEPIC array using the Partek Genomics Suite software, the manifest file may need to be configured. If it needs configuration, the Configure Annotation dialog will appear (Figure 3).

    • Select Chromosome is in one column and the physical location is in another column for Choose the column configuration

    • Select Ilmn ID for Marker ID

    • Select CHR for Chromosome i

    • Select MAPINFO for Physical Position

    • Select Close

    This enables Partek Genomics Suite to parse out probe annotations from the manifest file.

    Figure 3. Processing the annotation file. User needs to point to the columns of the annotation file that contain the probe identifier as well as the chromosome and coordinates of the probe.

    The results will appear as ANOVA-2way (ANOVAResults), a child spreadsheet of mvalue. Each row of the spreadsheet represents a single CpG locus (identified by Column ID).

    Figure 4. ANOVA spreadsheet. Each row is a result of an ANOVA at a given CpG locus (identified by the Column ID column). The remaining columns contain annotation and statistical output

    For each contrast, a p-value, Difference, Difference (Description), Beta Difference, and Beta Difference (Description) are generated. The Difference column reports the difference in M-values between the two groups while the Beta Difference column reports the difference in beta values between the two groups.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Du and colleaguesarrow-up-right
    Figure 1. Browsing to location from spreadsheet with differentially expressed genes

    The Chromosome View tab will open, zoomed in to the selected CpG locus in SBNO2 (Figure 2).

    Figure 2. Viewing location in Genome Viewer

    The Chromosome View visualization is composed of a series of tracks corresponding to annotation files and data files.

    • RefSeq Transcripts 2017-05-02 (hg19) (+): transcripts coded by the positive strand

    • RefSeq Transcripts 2017-05-02 (hg19) (-): transcripts coded by the negative strand

    • Regions: by default, difference in methylation (M-value) between the groups

    • Heatmap (1/mvalue): M values for all the samples

    • Barchart (Methylation): methylation level in M value of the selected sample (to select a sample, click on a heat map)

    • Heatmap (Methylation Tutorial): Beta values for all the samples

    • Barchart (Methylation): methylation level in Beta value of the selected sample (to select a sample, click on a heat map)

    • Cytoband: cytobands of the current chromosome

    • Genomic Label: coordinates on the current chromosome

    To modify a track, select it in the Tracks panel to bring up its configuration options panel below the Tracks panel. Let's modify a few tracks to improve our visualization of the data.

    • Select the Regions track, opens to Profile tab

    • Select Color tab

    • Set Color bars by to Difference (LCLs vs. B cells) (Description)

    • Select Apply to change

    This will color regions by up or down methylated.

    • Select the Heatmap (1/mvalue)

    • Select Remove Track

    • Select Bar Chart (Methylation) located directly below the Regions track

    • Select Remove Track

    We can now more clearly see the Difference in M values for the region in the Regions track, the heatmap of beta values in the Heatmap track, and the beta value for the loci of the selected sample in the Bar Chart track.

    • Select a sample on the heatmap to view its beta value in the Bar Chart track (Figure 3)

    Figure 3. Modify the tracks of the Genome Viewer to facilitate visual analysis

    The New Track button allows new tracks to be added to the viewer, while the Remove Track button removes the selected track from the viewer. Tracks can be reordered by selecting a track in the Tracks panel and dragging it up or down to move it in the list. In the Chromosome View, select () for selection mode and () for navigation mode. In navigation mode, left-click and draw a box on any track to zoom in. All tracks are synced and will zoom together. Zooming can also be controlled using the interface in the lower right-hand corner of the tab (). View can be reset to the whole chromosome level using reset zoom (). Searching for a gene or transcript in the position box will also zoom directly to its location.

    The available tracks can be supplemented with a special annotation file that can be built using a UCSC annotation file as the basis. Building and viewing the UCSC annotation file is available as an optional section of the tutorial, Optional: Add UCSC CpG island annotations.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Detect differentially methylated CpG islands

    The approach described in previous sections relies on ANOVA to detect differentially methylated CpG sites and takes individual sites as a starting point for interpretation. Since ANOVA compares M values at each site independently, this strategy is robust to type I/type II probe bias.

    An alternative could be to first summarize all the probes belonging to a CpG island region (i.e. island, N-shore, N-shelf, S-shore, S-shelf) and then use ANOVA to compare regions across the groups. Since the summarization will include both type I and type II probes, you may want to split the analysis in two branches and analyze type I and type II probes independently. To do this, we need to annotate each probe as type I or type II.

    • Select the mvalue spreadsheet

    • Select Transform from the main toolbar

    • Select Create Transposed Spreadsheet... from the Transform drop-down menu (Figure 1)

    Figure 1. Creating a transposed spreadsheet

    • Select Sample ID for Column: and numeric for Data Type:

    • Select OK

    A new temporary spreadsheet will be created with a row for each probe and columns for each sample.

    • Right-click on column 1. ID to bring up the pop-up menu

    • Select Insert Annotation

    • Select Add as categorical

    Figure 2. Adding Infinium design type and CpG island annotations

    • Select OK to add the Inifinium design type and UCSC CpG island name as categorical columns on the spreadsheet

    Now, we can use the interactive filter to create separate spreadsheets for type I and type II probes.

    • Select () to launch the interactive filter

    • Select 2. Infinium_Design_Type from the drop-down menu if not selected by default

    • Left-click the type I column to exclude it

    Figure 3. Creating a probe list with only Infinium type II probes

    • Name the new spreadsheet female_only_typeII_probes

    • Select OK

    • Save the created spreadsheet, we chose the file name female_only_typeII_probes

    The temporary spreadsheet is no longer needed so we can close it.

    • Close the temporary spreadsheet by selecting it in the file tree and selecting ()

    We can use these spreadsheets to generate lists of M values at CpG island regions

    • Select spreadsheet female_only_typeII_probes

    • Select Stat from the main toolbar

    • Select Column Statistics... under Descriptive (Figure 4)

    Figure 4. Selecting column statistics

    • Add Mean to the Selected Measure(s) panel

    • Select Group By and set it to 3. UCSC_CpG_Islands_Name (Figure 5)

    Figure 5. Configuring column statistics

    • Select OK

    The new temporary spreadsheet has one CpG island region per row (Figure 6), samples on columns, and the values in the cells represent the mean of M values of all the CpG probes in the region.

    Figure 6. New spreadsheet with average M values for probes at each CpG island; probes not at CpG islands are collected into the first row "- Mean"

    Note the first row, with label “– Mean”. It corresponds to all the probes that map outside of UCSC CpG islands. As it is not needed for the downstream analysis, we will remove it.

    • Right-click on the row header for Mean

    • Select Delete to remove the row

    The final step is to transpose the data back to its original orientation.

    • Select Transform from the main toolbar

    • Select Create Transposed Spreadsheet... from the Transform drop-down menu

    • Select 2. Level for Column: and numeric for Data Type:

    The layout of the new transposed spreadsheet is as follows: one sample per row with CpG island regions on columns; cell entries correspond to mean methylation status of the region (Figure 7). The column with a blank value for the column header is the average of all probes not associated with CpG island regions. You can delete this column if you like.

    Figure 7. Spreadsheet with average M values of probes in each CpG island for each sample

    • Right-click the transposed spreadsheet, 2_transpose

    • Select Save as... from the pop-up menu

    • Name it mvalues_typeII_probes_CpG_islands

    The mvalues_typeII_probes_CpG_islands spreadsheet can be used as a starting point for ANOVA and other analyses. You can also repeat the steps above to create an equivalent spreadsheet for type I probes.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Optional: Use MethylationEPIC for CNV analysis

    Although the 450K and MethylationEPIC arrays were initially designed to analyze DNA methylation, they are essentially a dense SNP array and can be used for copy number analysis (Feber et al. 2014). The probe intensity data is easily parsed from the idat files by using the Additional Probe Data Spreadsheet Selection dialog (Figure 1) when importing the raw data. Examining the raw intensity data can also be useful for QA/QC purposes.

    Follow the steps for importing Illumina methylation data detailed in Import and normalize methylation data until you reach the Import Illumina iDAT Data dialog with Manifest File and Output File panels (Figure 1).

    Figure 1. Customizing output during data import

    • Select Customize... to open the Advanced Import Options dialog

    • Choose No normalization in the Normalization section of the Algorithm tab

    • Select the Outputs tab (Figure 2)

    Figure 2. Selecting additional probe data to include during data import

    Information about the different output options can be found by selecting the adjacent () icon.

    Detection p-values. This is the confidence score that the signal of a probe was significantly higher than the background defined by negative control probes. Selecting this checkbox produces a spreadsheet ending with '_detectionp' in addition to the spreadsheet containing beta values. Each row of the _detectionp spreadsheet will be a different sample and the sample names will end in '_detectionp'. This spreadsheet can be used to filter out probes that do not show signal above background.

    Probe Intensity. This is the sum of the methlyated and unmethylated intensities per probe. Selecting this checkbox produces a spreadsheet ending with ‘_probe’ in addition to the spreadsheet containing beta values. Each row of the _probe spreadsheet will be a different sample and the file names will also end in ‘_probe.’ The probe intensity values will be log2 transformed by default (note that the beta values are not log2 transformed).

    Probe Signal. This option will become available if Probe Intensity is selected. Selecting this checkbox produces a spreadsheet ending with ‘_probe.’ The methylated and unmethylated intensities are shown on separate rows for each sample, in addition to the summed values. The sample names will end in ‘_meth’ or ‘_unmeth’ for methylated and unmethylated values, respectively. The probe intensity values will be log2 transformed by default.

    Raw Probe Intensity. This is the sum of the raw red and green signal intensities per probe. Selecting this checkbox produces a spreadsheet ending with ‘_raw’ in addition to the spreadsheet containing beta values. Each row of the spreadsheet will be a different sample and the file names will also end in ‘_raw.’ The raw probe intensity values will be log2 transformed by default.

    Raw Probe Signal. This option will become available if Raw Probe Intensity is selected. Selecting this checkbox produces a spreadsheet ending with ‘_raw.’ The red and green intensities will be shown on separate rows for each sample, in addition to the summed values. The sample names will end in ‘_red’ or ‘_green’ for red and green values, respectively. The raw probe intensity values will be log2 transformed by default.

    Antilog Probe Intensity Values. Selecting this checkbox will show the probe intensity data without log2 transformation.

    Create NCBI GEO Submission Spreadsheets. Generates matrix processed and matrix signal intensities spreadsheets for GEO submission.

    How you proceed depends on your study design. Here is an example series of steps to prepare the tutorial data set for copy number analysis:

    • Select Probe Intensity and Antilog Probe Intensity Values (Figure 2)

    • Select OK to close the Advanced Import Options dialog

    • Select Import to import the data and perform the selected normalization method

    Configure the Normalize to Baseline 1 dialog as shown (Figure 3)

    • Select Use control set form this spreadsheet

    • Set Control Category to B cells

    • Select Ratio to baseline from the Normalization Method section

    Figure 3. Configuring normalize to baseline

    • Select OK to generate the spreadsheet

    This spreadsheet contains copy number values per probe in log2 space (i.e. diploid = 0). Prior to performing copy number analysis, you can normalize for local GC abundance.

    • Select Transform

    • Select Adjust Based on Local GC Content...

    • Click OK to run Local GC Adjustment (Figure 4)

    Figure 4. Adjusting for local GC content

    The GC adjusted spreadsheet is the starting spreadsheet for copy number analysis. You can now switch over to the Copy number workflow, skip the Create copy number step, and begin with the Detect amplifications and deletions step. Consult the user's guide for the copy number workflow for subsequent steps.

    hashtag
    References

    Feber A, Guilhamon P, Lechner M, et al. Using high-density DNA methylation arrays to profile copy number alterations. Genome Biology. 2014;15(2):R30. doi:10.1186/gb-2014-15-2-r30.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Filter loci with the interactive filter

    The list, LCLs vs B cells, includes differentially methylated loci for locations across the genome; however, in many cases we may want to focus on loci located in particular regions of the genome. To filter our list to include only regions of interest, we can use the annotations provided by Illumina and the interactive filter in Partek Genomics Suite.

    • Select LCLs_Vs_B_cells from the spreadsheet tree

    • Right-click on the Gene Symbol column

    Perform gene set and pathway analysis

    To perform gene set and pathway analysis, we need to create a list of genes that overlap with differentially methylated CpG loci.

    • Select LCLs_vs_B_cells_CpG_Islands in the spreadsheet tree

    • Select Find Overlapping Genes from the Analysis section of the workflow

    The Output Overlapping Features dialog will open (Figure 1). This dialog allows you to choose the annotation database that will define where gene are located. By default the promoter region will be defined as 5000 base pairs upstream and 3000 base pairs downstream from the transcription start site.

    Select the (_probe) spreadsheet from the spreadsheet tree

  • Delete any samples with _detectionp names

  • Create sample attributes and assign samples to the groups as described in Annotate samples

  • Select Transform from the main toolbar

  • Select Normalize to baseline

  • Select After ratio apply log base 2

  • Select New Spreadsheet from the Configure Output section

  • our support pagearrow-up-right
    Select Infinium_Design_Type and UCSC_CpG_Islands_Name from the Column Configuration options (Figure 2)
    Right-click the temporary spreadsheet in the spreadsheet tree to bring up the pop-up dialog
  • Select Clone... (Figure 3)

  • Repeat process to create a spreadsheet for type I probes

    Select OK

    Close the source temporary spreadsheet by selecting it in the spreadsheet tree and selecting ()
    our support pagearrow-up-right

    Select Insert Annotation (Figure 1)

    Figure 1. Adding an annotation column to the ANOVA results

    • Select the Add as categorical option

    • Select Relation_to_UCSC_CpG_Island (Figure 2)

    CpG islands are regions of the genome with an atypically high frequency of CpG sites. CpG islands and their surrounding regions (termed shelf and shore) include many gene promoters and altered methylation in these regions can have a disproportionate effect on gene expression. For example, hyper-methylation of promoter CpG islands is a common mechanism for down-regulating gene expression in cancer.

    Figure 2. Adding chromosome location to ANOVA results

    • Select OK to add Relation_to_UCSC_CpG_Island as a column in next to 3. Gene Symbol

    • Select () from the quick action bar to save the ANOVA-2way (ANOVA Results) spreadsheet with the added annotation

    Now, we can filter probes by their relation to CpG islands.

    • Select () from the quick action bar to invoke the interactive filter

    • Select 4. Relation_to_UCSC_CpG_Island for Column

    For categorical columns, the interactive filter displays each category of the selected column as a colored bar. For 4. Relation_to_UCSC_CpG_Island, each bar represents one of the categories of the UCSC annotation . To filter out a category, left-click on its bar. Right clicking on a bar will include only the selected category. A pop up balloon will show the category label as you mouse over each bar.

    • Right-click the Island column to filter out other columns (Figure 3)

    Figure 3. Using Interactive Filter tool to filter out probes by annotation. When pointed to a categorical column, the Interactive Filter tool summarises the content of the column by a column chart. Left-click to exclude a category (two columns were excluded, so they are grayed out), right-click to include only

    The yellow and black bar on the right-hand side of the spreadsheet panel shows the fraction of excluded cells in black and included cells in yellow. Right-clicking this bar brings up an option to clear the filter.

    Now that we have filtered out probes that are not in CpG islands, we will create a spreadsheet containing only these probes.

    • Right click on the LCLs vs. B cells spreadsheet in the spreadsheet tree panel (Figure 4)

    Figure 4. Cloning a filtered spreadsheet creates a new spreadsheet with only the included cells

    • Select Clone

    • Rename the new spreadsheet LCLs_vs_B_cells_CpG_Islands using the Clone Spreadsheet dialog

    • Select mvalues from the Create new spreadsheet as a child spreadsheet: drop-down menu (Figure 5)

    • Select OK

    Figure 5. Renaming and configuring filtered spreadsheet

    • Select () from the quick action bar to save the filtered spreadsheet

    • Specify a name for the spreadsheet, we chose LCLs_vs_B_cells_CpG_Islands, using the Save File dialog

    • Select Save to save the spreadsheet

    You may want to save the project before proceeding to the next section of the tutorial.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Figure 1. Selecting Finding Overlapping Genes form the main toolbar

    • Select Ensembl Transcripts release 75 from the Report regions from the specified database options

    • You can select a name for the new list, we have named it gene-list

    • Select OK

    A new spreadsheet will be created as a child spreadsheet (Figure 2)

    Figure 2. Annotating the differentially methylated CpG loci with genes

    Partek Genomics Suite offers several tools to help interpret this list of genes. First, let's look at Gene Set Analysis.

    • Select Gene Set Analysis from the Biological Interpretation section of the Illumina BeadArray Methylation workflow

    • Select GO Enrichment for Select the method of analysis

    • Select Next >

    • Select 1/mvalue/lcls_vs_b_cells_cpg_islands/gene-list (gene-list.txt) for the source spreadsheet

    • Select Next >

    • Select Invoke gene ontology browser on the result and leave the rest of the options set to defaults for Configure the parameters of the test (Figure 3)

    Figure 3. Configuring the parameters of the test

    • Select Next >

    • Select Default Mapping File for Select the method of mapping genes to genes sets

    • Select Next >

    A new spreadsheet will be created with categories ranked by enrichment score and the Gene Ontology Browser will launch to graphically display the results of the spreadsheet (Figure 4). The results show which gene sets are over represented in the list of genes overlapped by differentially regulated CpG loci between the experimental and control groups.

    Figure 4. GO enrichment browser showing gene groups overrepresented in the list of genes which overlap with differentially methylated CpG loci

    To get a better idea whether genes associated with these GO terms have increated or decreased methylation, we can view the Forest Plot.

    • Select the Forest Plot tab

    Go terms are listed by the number of significantly up-regulated genes, with the percent up-regulated and down-regulated shown in red to green bars. Here, we see that most GO terms show increased methylation in their associated genes (Figure 4).

    Figure 5. Gene Ontology Forest Plot

    Next, we can perform Pathway Analysis to see which pathways are over represented in the gene overlapped by differentially regulated CpG loci.

    • Select gene-list from the spreadsheet tree

    • Select Pathway Analysis from the Biological Interpretation section of the Illumina BeadArray Methylation workflow

    • Select Pathway Enrichment for Select the method of analysis

    • Select Next >

    • Select 1/mvalue/lcls_vs_b_cells_cpg_islands/gene-list (gene-list.txt) for the source spreadsheet

    • Select Next >

    • Leave the default selections for the Configure parameters of the test panel

    • Select Next >

    • Leave the default selections for the Result File and Select the parameters panels

    • Select Next > to run the analysis

    The Pathway-Enrichment spreadsheet will be added to the spreadsheet tree in Partek Genomics Suite and the Partek® Pathway™ software will open to provide visualization of the most significantly enriched pathway as a pathway diagram (Figure 5). The color of the gene boxes reflects p-values of the associated differentially methylated CpG loci (bright orange is insignificant, blue is highly significant). The Color by option can be changed another column from the gene-list.txt spreadsheet, such as Difference.

    Figure 6. : Partek Pathway illustrating one of the pathways overrepresented in the list of genes overlapping the differentially methylated CpG sites.

    The Pathway-Enrichment spreadsheet can also be viewed in Partek Pathway by switching to the Pathway-Enrichment section of the menu tree on the left-hand side of the window. From the spreadsheet view, you can select a pathway name to visualize that pathway. Alternatively, you can open a pathway visualization in Partek Pathway from the Pathway-Enrichment spreadsheet in Partek Genomics Suite by right-clicking on a row and selecting Show pathway... from the pop-up menu. Please note that if you have closed Partek Pathway and have reopened it, you will need to import a gene list if you want to color the visualization by attributes form the gene list. For more information about using Partek Pathway, checkout our Partek Pathway Tutorial.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Annotate samples

    Each row of the spreadsheet (Figure 1) corresponds to a single sample. The first column is the names of the .idat files and the remaining columns are the array probes. The table values are β-values, which correspond to the percentage methylation at each site. A β-value is calculated as the ratio of methylated probe intensity over the overall intensity at each site (the overall intensity is the sum of methylated and unmethylated probe intensities).

    Figure 1. Spreadsheet after .idat file import: samples on rows (Sample IDs are based on file names), probes on columns, cell values are functionally normalized beta values (default settings)

    Before we can perform any analysis, the study samples need to be organized into their experimental groups.

    • Select Add Sample Attributes

    from the
    Import
    section of the
    Illumina BeadArray Methylation
    workflow
  • Select Add a Categorical Attribute from the Add Sample Attributes dialog (Figure 2)

  • Figure 2. Adding sample attributes. Adding Attributes from an Existing Column can be used to split file names into sections, based on delimiters (e.g. _, -, space etc.). Adding a Numeric or Categorical Attribute enables the user to manually specify sample attributes

    • Select OK

    The Create categorical attribute dialog allows us to create groups for a categorical attribute. By default, two groups are created, but additional groups can be added.

    • Set Attribute name: to Cell Type

    • Rename the groups B cells and LCLs

    • Drag and drop the samples from the Unassigned list to their groups as listed in the table below

    Sample ID
    Cell Type

    GSM2452106_200483200025_R04C01

    B cells

    GSM2452107_200483200021_R01C01

    B cells

    GSM2452108_200483200021_R02C01

    B cells

    GSM2452109_200483200025_R06C01

    B cells

    GSM2452110_200483200025_R07C01

    B cells

    GSM2452111_200483200021_R08C01

    B cells

    There should now be two groups with eight samples in each group (Figure 3).

    Figure 3. Adding Cell Type attribute as a categorical group

    • Select OK

    • Select Yes from the Add another categorical attribute dialog

    • Set Attribute name: to Gender

    • Rename the groups Male and Female

    • Drag and drop the samples from the Unassigned list to their groups as listed in the table below

    Sample ID
    Gender

    GSM2452106_200483200025_R04C01

    Female

    GSM2452107_200483200021_R01C01

    Female

    GSM2452108_200483200021_R02C01

    Male

    GSM2452109_200483200025_R06C01

    Female

    GSM2452110_200483200025_R07C01

    Female

    GSM2452111_200483200021_R08C01

    Female

    There should now be two groups with four samples in Male and twelve samples in Female (Figure 4).

    Figure 4. Adding Gender attribute as a categorical group

    • Select OK

    • Select No from the Add another categorical attribute dialog

    • Select Yes to save the spreadsheet

    Two new columns have been added to spreadsheet 1 (Methylation) with the cell type and gender of each sample (Figure 5).

    Figure 5. Annotated beta values spreadsheet

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    GSM2452112_200483200021_R06C01

    B cells

    GSM2452113_200483200021_R04C01

    B cells

    GSM2452114_200483200025_R01C01

    LCLs

    GSM2452115_200483200025_R03C01

    LCLs

    GSM2452116_200483200021_R03C01

    LCLs

    GSM2452117_200483200025_R05C01

    LCLs

    GSM2452118_200483200025_R02C01

    LCLs

    GSM2452119_200483200021_R07C01

    LCLs

    GSM2452120_200483200021_R05C01

    LCLs

    GSM2452121_200483200025_R08C01

    LCLs

    GSM2452112_200483200021_R06C01

    Female

    GSM2452113_200483200021_R04C01

    Male

    GSM2452114_200483200025_R01C01

    Female

    GSM2452115_200483200025_R03C01

    Female

    GSM2452116_200483200021_R03C01

    Male

    GSM2452117_200483200025_R05C01

    Female

    GSM2452118_200483200025_R02C01

    Female

    GSM2452119_200483200021_R07C01

    Female

    GSM2452120_200483200021_R05C01

    Female

    GSM2452121_200483200025_R08C01

    Male