Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This document describes how to setup and configure a node locked Partek Genomics Suite license and is required for users who purchase a node locked license.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This guide is specific to the installation of Partek Genomics Suite software on a Windows operating system.
With administrative privileges, click the below button to download the latest version of Partek Genomics Suite.
Download Partek™ Genomics Suite™
In some cases, a Microsoft Visual C++ Package failure message may appear (Figure 1). Select "Yes" to continue with the installation.
Once the download is completed, start the application by double clicking on the Partek Genomics Suite application icon located on your desktop. The default Partek License Manager window will appear. You will be prompted to provide a license file.
1. Save the license.dat (or license.lic) file that you received from the Patek Licensing department to your desktop.
If you do not have license, please contact your account representative or request a trial.
2. Select Add License (Figure 2).
3. Select the License file radio button.
4. Select Browse.
5. Click on the the license.dat (or license.lic) file located on your desktop and select Open (Figure 3).
6. The Partek License Manager - Add License screen will appear. Select Add (Figure 4).
License file path: C:/Users/username/Desktop/license.dat
License file directory: C:\Program Files\Partek Genomics Suite 7.0\license
The Partek License Manager window will now show you the status of your license (Figure 5).
7. Exit the Partek License Manager and Partek Genomics Suite will automatically start.
Once the software has been installed and the license has been added, you may delete the license.dat (or license.lic) file from your desktop (if you prefer, this is not required); a copy of your license file is saved to your license file folder (C:\Program Files\Partek Genomics Suite 7.0\license folder) after it has been added using the Partek License Manager.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This guide is specific to the installation of Partek Genomics Suite software on a Macintosh operating system.
With administrative privileges, click on the button below to download the latest version of Partek Genomics Suite.
Once the installation is completed, drag the Partek Genomics Suite icon into your Applications folder (figure 1). Once in the Applications folder, start the application by right clicking on the Partek Genomics Suite icon to Open.
In some cases, the security preferences may ask you to verify the software download (Figure 2).
1 . Save the license.dat (or license.lic) file that you received from the Partek Licensing department to your desktop.
2. Select Add License.
3. Select the License file radio button.
4. Select Browse.
5. Click on the license.dat (or license.lic) file located on your desktop and select Open.
6. The Partek License Manager - Add License screen will appear. Select Add.
The Partek License Manager window will now show you the status of your license.
7. Exit the Partek License Manager and Partek Genomics Suite will automatically restart.
Once the software has been installed and the license has been added, you may delete the license file from your desktop (if you prefer, although this is not required); a copy of your license file is saved to your license file folder (/Users/Shared) after it has been added using the Partek License Manager.
If you do not have a license, please contact your account representative or .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
This document describes the necessary steps to setup and configure a FlexNet license server for use with locked floating and floating concurrent Partek Genomics Suite licenses.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Genomics Suite (Next Generation Sequencing Studies)
Windows: 7 SP1 or newer
Mac: OS 10.11 - 10.13.4
Linux: Ubuntu® 12.04, Red Hat® 6, CentOS® 6, or newer
64-bit 2GHz quad-core processor
16GB of onboard memory
500GB of storage available for data and installation
A graphics card with OpenGL-capable drivers
Partek Genomics Suite (Microarray Studies)
Windows: 7 SP1 or newer
Mac: OS 10.11 - 10.13.4
Linux: Ubuntu® 12.04, Red Hat® 6, CentOS®, or newer
16GB of onboard memory
2GHz processor
500MB of storage available for data and installation
A graphics card with OpenGL-capable drivers
Partek Pathway
Windows: 7 SP1 or newer
Mac: OS 10.11 - 10.13.4
Linux: Ubuntu® 12.04, Red Hat® CentOS®, or newer
16GB of onboard memory
2GHz processor
500MB of storage available for data and installation
A graphics card with OpenGL-capable drivers
This guide is specific to the installation of Partek Genomics Suite software on a Linux operating system.
1. Download the latest Linux version of Partek Genomics Suite.
2. Click on Download under the Linux session and save/open the file. Extract on you preferred folder and remember its location.
4. Install FlexNet following this guide.
5. Open a terminal and navigate to the location where you extracted the "partekgs" folder.
6. Go into the "bin" directory and run the "partek" file (Figure 2).
7. The Partek License Manager will appear.
8. If you do not have a license, please contact licensing@partek.com.
Be sure to click on the "Copy Information" button under the "Computer information" section of this window to paste the information into the email to licensing@partek.com.
9. If you have a license, click on the "+ Add License" button.
10. The Partek License Manager - Add License window will appear.
11. Select the "License server" radio button and fill the "Server Name" with the "Host Name" that you copied from the "Computer information" section.
12. Select "Add".
11. After adding the license, the Partek License Manager window will reappear. Select "Validate Licenses" and the license information will show up on your window.
12. Close the Partek License Manager and run "./partek" again from the "partekgs/bin" directory.
13. The Partek License Agreement window will appear, select "Agree".
You are now able to use Partek Genomics Suite.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document is specific to the installation of a floating concurrent Partek Genomics Suite license on a Windows server. It is not required to install the full version of Partek on your license server; only the license server executables installed by the Partek License Server installer are required to serve Partek FlexNet licenses on your network.
1. Download the .
2. When prompted, select "C:FlexNet” as the installation folder and click “Next”.
3. Select all components from the list and click “Next”.
4. Select “Partek License Server” as the Start Menu and click “Next”.
5. Click "Install" and "Finish".
6. Save the license file (license.lic or license.dat) that Partek Licensing Support team sent you as license.lic and proceed to the Configuration section below.
The full pathname will be "C:\FlexNet\license.lic"
To install FlexNet to run as a system service (use admin privileges or right click and "Run as administrator"):
1. Run lmtools.
2. Navigate to the to the “Service/License File” tab and select the “Configuration using Services” radio button (figure 1).
3. Navigate to the “Config Services” tab and fill in the following (figure 2):
a. Service Name: Partek FlexNet Server
b. Path to the lmgrd.exe file: C:\FlexNet\lmgrd.exe
c. Path to the license: C:\FlexNet\license.lic
d. Path to the debug log file: C:\FlexNet\log.txt
e. Check “Use Services” and “start Server at Power Up”
f. Click “Save Service”
4. Navigate to the "Start/Stop/Reread" tab and start the service by clicking the "Start Server" tab (figure 3).
This will add a FlexNet server for Partek licenses into your set of system services. You can then control this service with the standard services control panel or lmtools.exe.
Please follow the directions below on the computer that you wish to install Partek Genomics Suite. Our Licensing team will use this information to generate your license file. This license file will be emailed back to you as an attachment with installation instructions.
Windows, Macintosh, Linux:
1. With administrative privileges, download the .
2. Once the installation is complete, start the application by double clicking on the Partek Genomics Suite icon.
3. Select Copy Information from the Computer Information section and paste the retrieved host name and ID in an email and send it to your account representative (figure 1).
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
For a step-by-step video to help you set up your license server on a Windows platform, please visit: .
For information about firewall configuration, refer to the and/or the ReadMe_FlexNetFirewallPinholes.txt document located in your FLEXnet folder.
For advanced configuration options or questions, refer to the .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
This document describes the steps necessary to set up a FlexNet license server for Partek Genomics Suite on a Linux server. You are not required to install the full version of Partek on your license server; only the license server executables installed by the Partek License Server installer or contained in FLEXnet11.12.zip are required to serve Partek FlexNet licenses on your network.
You must log on to the license server with an account which has administrative or sudo root privileges to install a system service (sysv or systemd), create directories on system folders, and/or modify the system configuration. Partek recommends completing all installation instructions with an account with administrative or sudo root privileges. If you install into a non-system directory, you may run the license server manually (see the Run the License Server Manually section).
Follow either the Automated Installation or the Manual Installation instructions below.
(make sure lsb-core is installed)
Automated Installation
1 . Download the Linux Installer.
2. When prompted, select "/opt/FlexNet" as the directory and click "Continue".
3. In the pop-up menu, select all components from the list and click "Continue".
4. Click "Install" and proceed to the Configuration section below.
Manual Installation
1 . Unzip the FlexNet11.12.zip file into the folder of your choice. The recommended location is "/opt". This will create a folder called "FlexNet" (it's full pathname will be "/opt/FlexNet") containing the executable and providing a location to store your license.lic file.
2. Move all of the files from the linux64 subfolder to the /opt/FlexNet folder.
3. Proceed to the Configuration section below.
Save your license.dat or license.lic file into the FlexNet folder as license.lic. It's full pathname will be "/opt/FlexNet/license.lic", if you use the recommended installation directory.
Follow either the Systemd Installation or SYSV Installation instructions below.
Systemd Installation
To install FlexNet to run as a system service (use admin privileges or right click and "Run as administrator"):
1 . Create user "parteklm" in group "parteklm"
sudo useradd -r -s /sbin/nologin parteklm
sudo chown parteklm:parteklm /opt/FlexNet
2. If not using default values, edit the file parteklm.service in the installation directory (/opt/FlexNet/parteklm.service) and change the non-default path, User, and/or Group
3. Make sure that /usr/tmp is created and has correct permissions by running the commands:
sudo mkdir -p /usr/tmp
sudo chmod 1777 /usr/tmp
4. Copy the parteklm.service file into place (/usr/lib/systemd/system may also be used):
sudo cp /opt/FlexNet/parteklm.service /etc/systemd/system/.
5. Enable the service:
sudo systemctl enable parteklm
6. Start the service:
sudo systemctl start parteklm
This will add a FlexNet server for Partek licenses into your set of system services that will automatically start on system restart. You can use the standard systemctl command to manage the service.
SYSV Installation
To install FlexNet to run as an sysv system service (still using admin privileges or sudo root):
1. Create user "parteklm" in group "parteklm"
sudo useradd -r -s /sbin/nologin parteklm
sudo chown parteklm:parteklm /opt/FlexNet
2. If not using default values, edit the file parteklm.init in the installation directory (/opt/FlexNet/parteklm.init) and change the non-default path (FLEXNETDIR), USER, and/or GROUP
3. Make sure that /usr/tmp is created and has the correct permissions by running the following commands:
sudo mkdir -p /usr/tmp
sudo chmod 1777 /usr/tmp
4. Copy the parteklm.init file into place (/usr/lib/systemd/system may also be used):
sudo cp /opt/FlexNet/parteklm.init /etc/init.d/parteklm
sudo chmod +x /etc/init.d/parteklm
5. Enable the service (using chkconfig):
sudo chkconfig --add parteklm
sudo chkconfig parteklm on
6. Start the service:
sudo service parteklm start
This will add a FlexNet server for Partek licenses into your set of system services that will automatically start on system restart. You can use the standard service command to manage the service.
To run the license server manually, you may use the command line (where <path_to_FlexNet> is /opt/FlexNet or the folder where you installed FLEXNet):
cd <path_to_FlexNet>
./lmgrd -c <path_to_FlexNet> -L <path_to_FlexNet>/log.txt
Putting a "+" (plus) character in front of the path to the log file (log.txt) causes the license manager server to append logging entries.
For more details, see the lmgrd - License Server Manager/Starting the License Server Manager on UNIX Platforms/Manual Start section of the FlexNet License Administration Guide.
Refer to the FlexNet License Administration Guide and/or the ReadMe_FlexNetFirewallPinholes.txt document lcoated in your FlexNet folder for information about firewall configuration.
Refer to the FlexNet License Administration Guide for advanced configuration options or questions.
This guide is for Partek Genomics Suite version 6.6 users with node-locked licenses.
Before following the steps shown in the videos below, please:
Download the Partek Genomics Suite version 7.0 installation file from here.
Download the Partek Genomics Suite version 7.0 license file you received from our licensing team and save it to your desktop. If you do not have a Partek Genomics Suite 7.0 license file, please contact licensing@partek.com.
Installation on Windows 10 is shown, but the process will be similar for older versions of Windows and on Mac.
After installing and adding your license file, you can delete the installation file and license file from your desktop; a copy of your license file is saved to your license file folder (C:\Program Files\Partek Genomics Suite 7.0\license by default on Windows) after it has been added using the license manager.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Frequently Asked Questions related to Partek Genomics Suite License Server
The log file is written on the computer that runs the license server. The user specifies the location of this log file when running the lmgrd command.
The log file can be found in the following folders, depending on your license server's platform:
Windows: "C:\FLEXnet" or "C:\Program Files (x86)\FlexNet Publisher License Server Manager\logs\parteklm"
Linux: “/opt/FlexNet”
Mac: "/Users/Shared/FlexNet"
To access the log file, open the file on the license server with your favorite text editor.
The log file may be viewable by more than one person but only on the same computer as the license server.
Restarting the license server will temporarily force users off of the server.
An option file may be used to prevent certain users/computer network addresses from using license features (see: the Managing the Options File chapter of the FlexNet License Administration Guide) by using the EXCLUDE or EXCLUDEALL keywords. If you set up an options file with EXCLUDE or EXCLUDEALL and restart the license server, you will "kick out a user" but not other users.
Partek Genomics Suite is a comprehensive suite of advanced statistics and interactive data visualization specifically designed to reliably extract biological signals from noise. Designed for high-dimensional genomic studies containing thousands of samples, Partek Genomics Suite is fast, memory efficient and will analyze large data sets on a personal computer. It supports a complete workflow including convenient data access tools, identification and annotation of important biomarkers, and construction and validation of predictive diagnostic classification systems.
Additional information can be found in the manual for Partek Genomics Suite version 6.6.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
This user guide describes how to export copy number and genotype data using Partek's Report Plug-in for Illumina GenomeStudio Genotype Module for use in Partek Genome Suite. The GenomeStudio plug-in lets you export data into a project that can be opened in Partek Genome Suite open directly. It is the fastest and most consistent way to get fully annotated Illumina gene expression data into Partek.
Download the plug-in zip file
unzip the file, there is a folder called PartekReport which contains two .dll files --Partek.Common.dll and Partek.GeneExpression.GenomeStudio.dll, move the PartekReport folder to
C:\Program Files \Illumina\GenomeStudio 2.0\Modules\BSGT\ReportPlugins, if there is no ReportPlugins folder in BSGT folder, create one, the path and folder names have to be exactly match one described above (Figure 1).
In GenomeStudio genotype project:
Choose Analysis > Reports>Report Wizard from the main menu
Select Custom Report and choose Partek Report Plug-in from the drop-down list
Specify AnnotationName, do NOT include <> in the name, you can the same name as the .bgx file you imported the ddata with, or a unique name to your dataset
Figure 1. Configuring the GenomeStudio copy number report dialog
Leave all the others as default value (Figure 2) click Next
Specify the report file name, we recoommend to put the exported files in their own folder, which allows you to move thefolder instead of all the files individually.
Click Finish (Figure 2)
Figure 2. Specify output folder and file name
The output generate 9 files in the folder including a project file (.ppj), annotation file, summary file and 3 sets of Partek spreadshet file-- each spreadsheet consists of 2 files.
To open the report, launch Partek Genomics Suite, choose File > Open Project, browse to the .ppj file to open. There will be three spreadsheets opened (Figure 3)
Figure 3. Open project in Partek Genomics Suite
Spreadsheet 1 contains genotype calls, spreadsheet 2 contains log R ratio which is copy number in log scale, spreadsheet 3 contains B allele frequency.
To do copy number analysis, select spreadsheet 2 log R ratio, choose Copy number workflow, start from QA/QC section. Genotype spreadsheet will be used for Association and LOH workflow.
The GenomeStudio plug-in lets you export data into a project that can be opened in Partek Genome Suite open directly. It is the fastest and most consistent way to get fully annotated Illumina data into Partek Genomics Suite.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
This user guide describes how to export gene expression data using Partek's Report Plug-in for Illumina GenomeStudio Gene Expression Module for use in Partek Genome Suite. The GenomeStudio plug-in lets you export data into a project that can be directly opened in Partek Genomics Suite. It is the fastest and most consistent way to get fully annotated Illumina gene expression data into Partek Genomics Suite.
Download the plug-in zip file
unzip the file, there is a folder called PartekReport which contains two .dll files --Partek.Common.dll and Partek.GeneExpression.GenomeStudio.dll, move the PartekReport folder to
C:\Program Files (x86)\Illumina\GenomeStudio\Modules\BSGX\ReportPlugins, if there is no ReportPlugins folder in BSGX folder, create one, the path and folder names have to be exactly match one described above (Figure 1).
Figure 1. Place PartekReport folder in the appropriate direcotry
In GenomeStudio gene expression project:
Choose Analysis > Reports... from the main menu
Select Custom Report and choose Partek Report Plug-in from the drop-down list
Specify AnnotationName, do NOT include <> in the name, you can the same name as the .bgx file you imported the data with, or a unique name to your dataset
Choose Type by clicking on the cell, default is gene level
Leave all the others as default value (Figure 2)
Specify the report file name, we recommend to put the exported files in their own folder, which allows you to move the folder instead of all the files individually.
Click OK
Figure 2. Configuring the GenomeStudio gene expression report dialog
There are five files exported, including a project file (.ppj), which can be opened directory in Partek Genomic Suite. The project file opens the signal intensities data in a spreadsheet and associates the annotation information to the intensity spreadsheet. All intensities are log2 transformed. If there are negative values in the AVG_Signal, the data will be shifted to the lowest value one and then log2 transformed.
To open the report, launch Partek Genomics Suite, choose File > Open Project, browse to the .ppj file to open. In the Gene Expression workflow, you can proceed add sample attribute step.
This guide is specific to the client computer connection instructions to a floating concurrent Partek Genomics Suite license server.
With administrative privileges, download the .
Once the download is completed, start the application by clicking on the Partek Genomics Suite icon. The default Partek License Manager window will appear. You will be prompted to provide a license (figure 1).
Select Add License.
Select the License server radio button.
Enter the Server Name and select Add.
You will need to obtain the server name from your license server administrator.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Windows: Under the control panel, click Add/Remove Programs and select Partek Genomics Suite and click Uninstall.
Macintosh: Delete the license.dat file from the /Users/Shared folder.
Windows, Linux, Macintosh:
1. Log on to the server as an administrator.
2. Navigate to the "C:\FLEXnet" folder and open lmtools.exe.
3. On the first tab - "Service/License File", make sure that the "Configuration using Services" radio button is selected and Partek FlexNet Service appears in the white box.
4. Go to the "Start/Stop/Reread" tab. You should see the same Partek FlexNet Service highlighted in the list of installed services. Check the "Force Server Shutdown" box, and click the "Stop Server" button.
5. Go to the "Config Services" tab, verify that the Partek FlexNet Service is selected in the "Service Name" box, and click "Remove Service". Select "Yes" when prompted.
6. If there are no other applications installed on the server licensed with FlexNet, you may safely delete the FlexNet folder and all of its contents.
Windows, Linux, Macintosh:
1. Log on to the server as an administrator.
2. Navigate to the "C:\FLEXnet" folder and open lmtools.exe.
3. On the first tab - "Service/License File", make sure that the "Configuration using Services" radio button is selected and Partek FlexNet Service appears in the white box.
4. Go to the "Start/Stop/Reread" tab. You should see the same Partek FlexNet Service highlighted in the list of installed services. Check the "Force Server Shutdown" box, and click the "Stop Server" button.
5. Go to the "Config Services" tab, verify that the Partek FlexNet Service is selected in the "Service Name" box, and click "Remove Service". Select "Yes" when prompted.
6. If there are no other applications installed on the server licensed with FlexNet, you may safely delete the FlexNet folder and all of its contents.
There are many useful visualizations, annotations, and biological interpretation tools that can operate on a gene list. In order for these features work with an imported list, an annotation file must be associated with the gene list. Additionally, many operations that work with a list of significant genes (like GO- or Pathway-Enrichment) require comparison against a background of “non-significant” genes. The quickest way to accomplish both is to use the background of “all genes” for that organism provided by an annotation source like RefSeq, Ensembl, etc. in .pannot (Partek annotation), .gff, .gtf, .bed, tab- or comma-delimited format. If the file is not already in a tab-separated or comma delimited format, you may import, modify, and save the file in the proper file format.
Select File from the main toolbar
Select Genomic Database under Import (Figure 1)
Select the annotation file; in this example, we select a .pannot file downloaded from Partek distributed library file repository – hg19_refseq_14_01_03_v2.pannot
Delete or rearrange the columns as necessary; we have placed the column with identifiers (should be unique ID) that correspond to our gene list first
Select File then Save As Text File... to save the annotation file; we have named it Annotation File (Figure 2)
Now we can add the annotation file to our imported gene list.
Right click 1 (gene_list.txt) in the spreadsheet tree
Select Properties from the pop-up menu
This brings up the Configure Genomic Properties dialog (Figure 3).
Select Browse under Annotation File
Choose the annotation file; we have chosen Annotation File.txt
If this is the first time you have used an annotation, the Configure Annotation dialog will launch. This is used to choose the columns with the chromosome number and position information for each feature. Our example annotation file has chromosome, start, and stop in separate columns.
Select the proper column configuration options (Figure 4)
Select Close to return to the Configure Genomic Properties dialog
Select Set Column: to open the Choose column with gene symbols or microRNA names dialog (Figure 5)
Select the appropriate column; here the default choice of 1. Symbol is appropriate
Select OK to return to the Configure Genomic Properties dialog
Select the appropriate species and genome build options; we have selected Homo sapiens and hg19 (Figure 6)
Select OK
The annotation file has been associated with the spreadsheet and additional tasks can now be performed on the data, e.g. since the annotation has genomic location, you can draw chromosome view on this data.
If an annotation file has been associated with a spreadsheet, annotations from the file can be added as columns in the spreadsheet when each identifier is on a row.
Right click on a column header
Select Insert Annotation
Select columns to add from Column Configuration; we have selected Chromosome, Start, and Stop (Figure 7)
Select OK
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
In Partek Genomics Suite:
Go to Edit > Preferences > File Locations
Set Temporary File Default Directory to a location on the alternate drive using Browse...
Click OK
Go to Tools > File Manager...
Set Default Library File Folder to a location on the alternate drive using Change...
Click OK
Using a fast SSD hard drive for the Temporary File and Library File folders will improve the performance of Partek Genomics Suite.
Continue with the installation by selecting "Yes" and proceed through the remaining prompts.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () to close the annotation file
Select () to save the spreadsheet
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
A list of SNPs using dbSNP IDs can be imported as a text file and associated with an annotation file as described for a list of genes. The annotation file you use to annotate the SNPs should minimally contain the chromosome number and physical position of each locus.
Novel SNPs or SNPs that are not found in your annotation source must be imported as a region list. For this, follow the procedure outlined in Starting with a list of genomic regions, but use the SNP name in place of a region name.
Starting with a list of SNPs that have been associated with genomic loci using an annotation file and assigned a species with genome build, you can use Find Overlapping Genes to annotate these SNPs with the closest genes.
Select Tools from the main toolbar
Select Find Overlapping Genes (Figure 1)
Figure 1. Adding overlapping genes to a SNP list
Select Add a New Column with the Gene Nearest to the Region from the method dialog
The Report Regions from the specified database dialog will open.
Select your preferred database. Be sure to match the species and genome build of your SNP list
Select OK
This will add 3 columns to the list of SNPs spreadsheet including Nearest Feature, which will indicate the nearest gene and strand (Figure 2).
Figure 2. Find Overlapping Genes adds three columns to a SNP list: overlapping features, nearest feature, and distance to nearest feature (bps)
To allow gene list operations such as GO Enrichment or Pathway Enrichment to be performed on the SNP list, we can set the Nearest Feature column as the gene symbol column for the spreadsheet.
Right click the spreadsheet in the spreadsheet tree
Select Properties from the pop-up menu
Select Gene symbol instead of Marker ID
Select Feature in column and select Nearest Feature (Figure 3)
Select OK
Figure 3. Setting Nearest Feature as the gene symbol allows gene list functions to be performed on a SNP list
If you have a SNP spreadsheet that was generated using Partek Genomics Suite (not imported as a .txt file), you can annotate the SNP list with gene, transcript, exon, and information about the predicted effect of the SNPs.
Select Tools from the main command toolbar
Select Annotate SNVs
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Scientists often develop lists of genes, probes, transcripts, SNPs, and genomic regions of interest from analysis tools, research papers, and databases. Using Partek Genomics Suite, these lists can be integrated with genomics data sets, analyzed with powerful statistics, and visualized for new insights.
This user guide will illustrate:
This user guide does not discuss every operation that can be performed on an imported list of regions, SNPs, or genes. If there is some other feature in Partek Genomics Suite that you would like to apply to an imported list, please contact the technical support team for additional guidance. If you have found a novel use of a Partek Genomics Suite feature on an imported list that you think should be included in this user guide, please let us know.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
With gene ontology (GO) ANOVA, Partek Genomics Suite includes the ability to use rigorous statistical analysis to find differentially expressed functional groupings of genes. Leveraging the Gene Ontology database, Partek Genomics Suite can organize genes into functional groups. Not only can GO ANOVA detect up and down regulated functional groups, but also functional groups, which are disrupted in a few genes as a result of treatment. Moreover, the common diction of the GO effort enables this analysis to be compared across all types of gene expression data, including those from other species. Traditional tests, such as GO enrichment, require defining filtered lists of differentially expressed genes followed by an analysis of functional groups related to those genes. On the other hand, GO ANOVA is performed directly after data import and normalization. This minimizes the risk that a highly stringent filter will cause important functional groups to be overlooked.
Other tests, such as gene set enrichment analysis (GSEA), tolerate minimal or no pre-filtering. However, these tests are very limited in their ability to integrate complicated experimental designs. GSEA, for example, can only handle two groups at a time. GO ANOVA, on the other hand, can leverage the wealth of sample information collected and use powerful multi-factor ANOVA statistics to analyze very complex interactions and regulatory events. The analysis output includes detailed statistical results specifying the effect and importance of phenotypic information on differential expression and subsequent disruption of Gene Ontology functional categories. Furthermore, GSEA calculates enrichment scores using a running-sum statistic on a ranked gene list. GO ANOVA takes into account more information by utilizing each sample’s expression values to calculate the enrichment score.
Note that the same principles apply to Pathway ANOVA, the only difference being the mapping file; GO ANOVA organizes genes into GO categories, while Pathway ANOVA looks at pathways.
This user guide deals with the following topics:
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Partek Genomics Suite software and license resides on a single computer (laptop, desktop, or server - any type). Any user with an account on the computer can use the license. Remote access capability is not available. Windows and Macintosh platforms only.
The Partek Genomics Suite license is installed on one computer on a network, which becomes the "license server". Partek Genomics Suite software is installed on an unlimited number of computers, which use the network to access the license. Concurrent license use is limited to the number of seats purchased. The license server should remain on and accessible over the network at all times.
The Partek Genomics Suite license is installed on one computer and can be accessed locally or remotely by only one person at a time.
The preferred method for importing a generic data spreadsheet into Partek Genomics Suite is as a text file. Here, we illustrate importing a list of genes with p-value and fold-change from an experiment comparing two conditions.
Select File from the main toolbar
Select Import
Select Text (.csv .txt)...
Select the text file using the file browser to launch the Import .txt, .tsv, or .csv File dialog
The File Type section of the Import dialog includes a preview of the text file and import options (Figure 1).
The columns in the import file can be separate by a tab, comma, or any other character.
For most applications, the items on the list should be in rows while attributes or values should be in columns. If a list is oriented with items on columns, select Transpose the file to to import a transposed spreadsheet.
Select Next > to move to the Data Type section
Select your data type; here we have chosen Genomic Data because it is a gene list (Figure 2)
We have also deselected Is the data log transformed (LOG_base (x+offset) ) ?
Selecting Genomic Data will open a dialog after import to configure properties for the imported list including selecting the type of genomic data, the location of genomic features in the spreadsheet, the annotation column with gene symbols, the chip or reference source and annotation file, the species, and reference genome build.
Select Next >
The next step is to identify where the data starts and where the optional header is found using Identify Column Labels, Start of Data (Figure 3). The line that contains the header (if present) must precede the data. If there are lines to be skipped in the file (like comments), they may only appear at the top of the file, before the header line or data begin.
If there are many comment lines at the start of the file, you may need to select View Next 5 Records to get to the row that contains the column header. If you accidentally move past the screen that contains the header or data rows, select View Previous 5 Records.
If there are missing numerical values or empty cells in your input list, insert a special character or symbol (?, N/A, NA, etc.) in the missing cells; you will specify the character in the Missing Data Representation section of the dialog, only one symbol can be used to represent missing values, the default missing value indicator is ?.
If a header row is present, select Col Lbls to allow you to select a column header row
Select the row where the data beings using the Begin Data selector
If any cells have a missing value, you can signify this with a special symbol selected using the Missing Data Representation panel
Select Next >
The Preview text encoding section (Figure 4) previews the first five lines of the file, allowing you to check if the text encoding is correct.
If the text does not appear properly, use the Specify the text encoding: drop-down menu to choose the correct encoding
Select Next >
The final section of the Import .txt, .tsv, or .csv File dialog is Verify Type & Attribute of Data Columns (Figure 5). While data column type and attribute can be modified after import, it is easier and faster to select the proper options during import as multiple columns may be selected during this dialog.
Check and modify column types and attributes
If there is an identifier like gene symbol or SNP, the Type field for that column should be set to text and Attribute should be set to label. Numeric values (intensities, p-values, fold-changes, etc.) should have Type set to double and Attribute set to response. The other possible value for Attribute is factor and describes sample data. The user interface is this dialog allows you to select multiple columns at once (Ctrl+left click and Shift+left click). The interface controls are detailed in the dialog (Figure 5).
Select Finish to import the text file and open it as a spreadsheet
If Genomic data was selected in the Data Type section, the Configure Genomic Properties dialog will open (Figure 6). These options will be discussed in the next section when we add an annotation file.
Select OK
The imported spreadsheet will open (Figure 7).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Hierarchical clustering groups similar objects into clusters. To start, each row and/or column is considered a cluster. The two most similar clusters are then combined and this process is iterated until all objects are in the same cluster. Hierarchical clustering displays the resulting hierarchy of the clusters in a tree called a dendrogram. Hierarchical clustering is useful for exploratory analysis because it shows how samples group together based on similarity of features.
Hierarchical clustering is an unsupervised clustering method. Unsupervised clustering methods do not take the identity or attributes of samples into account when clustering. This means that experimental variables such as treatment, phenotype, tissue, number of expected groups, etc. do not guide or bias cluster building. Supervised clustering methods do consider experimental variables when building clusters.
To illustrate the capabilities and customization options of hierarchical clustering in Partek Genomics Suite, we will explore an example of hierarchical clustering drawn from the tutorial Gene Expression Analysis. The data set in this tutorial includes gene expression data from patients with or without Down syndrome. Using this data set, 23 highly differentially expressed genes between Down syndrome and normal patient tissues were identified. These 23 differentially regulated genes were then used to perform hierarchical clustering of the samples. Follow the steps outlined in Performing hierarchical clustering to perform hierarchical clustering and launch the Hierarchical Clustering tab (Figure 1).
Figure 1. Heatmap showing results of hierarchical clustering
The right-hand section of the Hierarchical Clustering tab is a heat map showing relative expression of the genes in the list used to perform clustering. The heat map can be configured using the properties panel on the left-hand side of the tab. In this example, the low expression value is colored in green, the high expression value is in red, and the mid-point value between min and max is colored in black.The dendrograms on the left-hand side and top of the heat map show clustering of samples as rows and features (probes/genes in this example) as columns. Columns are labeled with the gene symbol if there is enough space for every gene to be annotated. Rows are colored based on the groups of the first sample categorical attribute in the source spreadsheet. The sample legend below the heat map indicates which colors correspond to which attribute group. In this example, Down syndrome patient samples are red and normal patient samples are orange.
The heat map can be configured using the properties panel on the left-hand side of the Hierarchical clustering tab.
Select the Rows tab
Verify that Type appears in the annotation box
Set Width (in pixels) to 25
This will increase the width of the color box indicating sample Type.
Select Show Label
Set Text size to 12
Set Text angle to 90
This angle is relative to the x-axis. When set to 90, the text will run along the y-axis.
Select Apply
The sample attributes are now labeled with group titles (Figure 2).
Figure 2. Labeling heat map with sample attribute groups
Select the Rows tab
Select Tissue from the New Annotation drop-down menu
Select Apply
Color blocks indicating the tissue of each sample have been added to the row labels and sample legend (Figure 3).
Figure 3. Sample attributes can be added to the heat map as sample labels
By default, Partek Genomics Suite displays samples on rows and features on columns. We can transpose the heat map using the Heat Map tab in the plot properties panel.
Select the Heat Map tab
Select Transpose rows and columns in the Orientation section
Select Apply
The plot has been transposed with samples on columns and features on rows. The label for the sample groups is now in the vertical orientation because the settings we applied to Rows has been applied to Columns.
Select the Columns tab
Select the Type track
Set Text angle to 0
Select Apply
The sample group label for Type is now visible (Figure 4).
Figure 4. Heat map columns and rows can be transposed
Each cluster node has two sub-cluster branches (legs) except for the bottom level in the dendrogram, the order of the two branches (or legs) is arbitrary, so the two sub-clusters position can be flipped within the cluster. This does not change the clustering, only the position of the clusters on the plot.
Clicking on a line (or drawing a bounding box on a line using left mouse button) that represents a sub-cluster branch (or dendrogram leg) will flip the selected leg with the other one leg within the same parent cluster. In this example, clicking on the bottom line will move it to the top of the heat map (Figure 5).
Figure 5. Rows and columns can be flipped by using Flip Mode to select dendrogram legs
The minimum, maximum, and midpoint colors of the heart map intensity plot can be customized.
Select the Heat Map tab
Select Apply
The heat map and plot intensity legend now show maximum values in yellow and minimum values in light blue with a black midpoint (Figure 6). The data range can also be customized by changing the values of Min and Max.
Figure 6. Heat map colors for minimum, maximum, and midpoint intensity can be customized
We can use the hierarchical clustering heat map to examine groups of genes that exhibit similar expression patterns. For example, genes that are up-regulated in Down syndrome samples and down-regulated in normal samples.
Select on the middle cluster of the rows dendrogram as shown (Figure 7) by clicking on the line or drawing a bounding box around the line
The lines within the selected cluster will be bold and the corresponding columns (or rows) on the spreadsheet in the analysis tab will be highlighted.
Figure 7. Selecting a dendrogram cluster using Selection Mode
Right-click anywhere in the viewer
Select Zoom to Fit Selected Rows
The same steps can be used to zoom into columns or rows. Here, we have zoomed in on rows, but not columns to show the expression levels of the selected genes for all samples (Figure 8).
Figure 8. Viewing only selected genes for all samples
Left click anywhere in the hierarchical clustering plot to deselect the dendrogram
Partek Genomics Suite can export a list of genes from any cluster selected, allowing large gene sets to be filtered based on the results of hierarchical clustering.
Select the bottom cluster of the rows dendrogram
Right-click to open the pop-up menu
Select Create Row List... (Figure 9)
Figure 9. Creating gene list from selected cluster
Name the gene set down in normal
Select OK
Save the list as down in normal
In the Analysis tab, there is now a spreadsheet row_list (down in normal.txt) containing the 6 genes that were in the selected cluster. The same steps can be used to create a list of samples from the hierarchical clustering by selecting clusters on the sample dendrogram.
Once you have created a customized plot, you can save the plot properties as a template for future hierarchical clustering analyses.
Select the Save/Load tab
Select Save current...
Name the current plot properties template; we selected Transposed Blue and Yellow
The new template now appears in the Save/Load panel as an option. To load a template, select it in the Load/Save panel and select Load selected. Note that all properties, including Min and Max values and sample groups (based on the column number of the attribute in the source spreadsheet) that may not be appropriate for a different data set, will be applied.
The hierarchical clustering plot can be exported as a publication quality image.
Select the Hierarchical Clustering tab
Select File from the main toolbar
Select Save Image As... from the drop-down menu
Select a destination and name for the file
Select PNG or your preferred image type from the pull-down menu
Select Save
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Select () from the Mouse Mode icon set to activate Flip Mode
Set Min color to () using the color picker tool
Set Max color to () using the color picker tool
Select () from the Mouse Mode icon set to activate Selection Mode
To reset zoom select () on the y-axis to show all rows and the x-axis to show all columns.
Select () on the y-axis to show all rows
Select () from the Mouse Mode icon set to activate Selection Mode
As these features require intensity (or count) data as well as experimental groups, these features cannot be performed on an imported lists.
If the data from imported spreadsheets has been associated with annotations, several integration approaches may be used to integrate multiple kinds of imported data.
The Genome Browser may be used to display data from multiple spreadsheets/experiments regardless of the type of spreadsheets (imported data or microarray or NGS experiments).
The Venn Diagram tool may be used to find overlaps based on a feature name.
The Find Overlapping Regions tool can use an imported gene list and a list of regions from a copy number or ChIP-Seq experiment to identify genomic regions in common.
This User Guide did not discuss every operation that can be performed on an imported list of regions, SNPs, or genes. If there is some other feature that you would like to apply to an imported list, please contact the technical support team for additional guidance. If you have found a novel use of a feature on an imported list that you think should be included in this User Guide, please let us know.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Gene Ontology (GO) Enrichment p-value calculation uses either a Chi-Square or Fisher’s Exact test to compare the genes included in the significant gene list to all possible genes present in the experiment or the background genes. For a microarray experiment, background genes consists of all genes on the chip/array; for a next generation sequencing experiment, all genes in the species transcriptome are considered background genes.
Because the calculation is essentially comparing overlapping sets of genes and does not use intensity values, GO Enrichment can be performed on an imported gene list even without any numerical values. GO Enrichment is available through the Gene Expression workflow.
If no annotation file has been specified for the gene list, GO Enrichment will use the full species transcriptome as the background genes. While suitable for next generation sequencing experiments, for microarray experiments, only the genes on the chip/array are appropriate. Please contact our technical support department for assistance with this step if needed.
Like GO Enrichment, Pathway Enrichment does not require numerical values, but instead operates on lists of genes - a list of significant genes vs. background genes. Consequently, Pathway Enrichment may be used with an imported list of genes even without any numerical values. The list of background genes is set to the species transcriptome by default, but can be set to a specific set of genes if the gene list has been associated with an annotation file.
A gene list can be used to filter another spreadsheet. As an example, we will filter the results of an ANOVA on microarray data using a gene list. This will create a spreadsheet with ANOVA results for only the genes included in our gene list.
Open the filtering gene list and target spreadsheets
Select the target spreadsheet in the spreadsheet tree, in this example, genes are on rows in ANOVA result spreadsheet
Select Filter from the main toolbar
Select Filter Rows Based on a List... from Filter Rows (Figure 1)
Select the matching column of your target spreadsheet from the Key column drop-down menu; here we have selected 4. Gene Symbol (Figure 2)
Select the filtering gene list from the Filter based on spreadsheet drop-down menu; here we have selected 1 (Gene List.txt)
Select the matching column of your filtering gene list from the Key column drop-down menu; here we have selected 1. Symbol
Select OK to apply the filter
The target spreadsheet will display the filtered rows (Figure 3). Note that the number of rows has gone from 22,283 prior to filtering (Figure 1) to 153 after filtering (Figure 3).
To use this filtered list for downstream analysis, we can save it.
Right-click the open spreadsheet in the spreadsheet tree
Select Clone...
Use the Clone Spreadsheet dialog to name the new spreadsheet and choose its place in the spreadsheet hierarchy
Select OK
The new spreadsheet will open. If you want to use the new spreadsheet again in the future, be sure to save it.
If your imported data contains a list of p-values, you can use any of the available multiple test corrections.
Select Stat from the main toolbar
Select Multiple Test
Select Multiple Test Corrections to launch a dialog with available options (Figure 4), it will add corrected p-value column(s) to the right of the selected p-value column(s)
A variety of profile plots can be used to visualize the numerical data associated with your imported gene list.
Select View from the main toolbar
Select any applicable option
If you have imported numerical data associated with genes (like p-values or fold-changes), you can visualize these values in the Genome Browser once an annotation file is associated to the spreadsheet, and there is genomic location information in the annotation file.
Right-click on a row header in the imported gene list spreadsheet
Select Browse to location
If the annotations have been configured properly, you should see a Regions track for the first column of numerical data, a cytoband track, and an annotation track. You can also add another track to display a second column of numerical data.
Select New Track
Select Add a track from spreadsheet
Select Next >
A new track titled Regions will be added.
Select Regions in the track preferences panel to edit it
Select the other numerical column in the Bar height by drop-down menu
For a gene list with expression values on each sample, clustering can be performed. Access the clustering function through the toolbar, not from a workflow. The workflow implementations assume that the data to be clustered are found on a parent spreadsheet and the list of genes is in a child spreadsheet.
Select Tools form the main toolbar
Select Discover then Hierarchical Clustering
Hierarchical Clustering assumes that samples are rows and genes are columns so consider transposing your data if this is not the case. If you have only one column or row of data, cluster only on the dimension with multiple categories by deselecting either Rows or Columns from What to Cluster in the Hierarchical Clustering dialog.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
A region list must contain the chromosome, start location, and stop locations as the first three columns. The chromosome number in the region list must be compatible with the genomic annotation for the species if you plan to use any feature (like motif detection) that requires reference sequence information.
Import the region list as described above for text files with the following options
Select Other for data type
Set chromosome as a text field
Set location start and stop as either integer or text fields
Right-click on the imported spreadsheet in the spreadsheet tree
Select Properties
Select List of genomic regions from the Configure Spreadsheet dialog to add region to the properties (Figure 1)
Figure 1. Adding region to the properties of a spreadsheet
The spreadsheet properties will now include region. Alternatively, region can be added as a spreadsheet property from the Configure Genomic Properties dialog by selecting Advanced.. , choosing region from the drop-down menu, selecting Add, and selecting OK.
If you would like to do any operation that requires looking up the reference genomic sequence information for the regions based on genomic location, you will need to specify the species for this region list.
Right-click on the imported spreadsheet in the spreadsheet tree
Select Properties
Select species from the Add Property drop-down menu and click Add
Specify the Species Name and Genome Build from the drop-down menus
Select OK
Starting with a region list, you may detect either known or de novo motifs using the ChIP-Seq workflow if your spreadsheet has been associated with a species and a reference genome.
Select ChIP-Seq from the Workflows drop-down menu
Select Motif detection from the Peak Analysis section of the workflow
Both Discover de novo motifs and Search for known motifs can be performed. Motif detection sequence information of the genome, you can specify either .2bit file or .fa file which can be used to create .2bit file
If you have a region list or a .BED file and you have a microarray experiment with data, you can summarize the microarray data by the genomic coordinates contained in the region list. For example, the region list contains a list of CpG islands, the experiment contains methylation percentage values for probes (β values), and you would like to summarize the methylation values of all probes in each CpG island.
Import the region list (or .BED file)
Be sure that you have added the region property. The list of region coordinates (chromosome, start, stop) from the region list will be mapped against the reference genome specified for the microarray data so specifying Species and Genome Build for your region list is unnecessary.
Open the microarray data spreadsheet, this spreadsheet should have annotation file associated to, and there are genomic location information in the annotation file.
Samples should be on rows and data on columns in the microarray data spreadsheet.
Select the region list spreadsheet
Right-click any column header in the region list spreadsheet
Select Insert Average from the pop-up menu (Figure 2)
Figure 2. Adding the average values for a region list
Select the microarray data spreadsheet containing the values you want to average for each region from the Get average from spreadsheet drop-down menu
There are three options for averaging the data (Figure 3). Mean of samples significant in region is used when the region list has SampleIDs from the microarray data set associated with each region. In this case, only the microarray data set samples specified for each region would be included in the mean calculation. Mean of all samples will add columns for the mean value of all probes for all samples and the number of probes for all samples in each region. Mean value for all samples separately will add two columns for each sample with the mean value of all probes for that sample and the number of probes for that sample in each region.
We have selected Mean value for all samples
Select OK (Figure 3)
Figure 3. Selecting options for adding average values for regions
Columns will be added to the regions list spreadsheet. Here, we have added two columns with the average β-value for all samples in each CpG island and the number of probes in each CpG island (Figure 4).
Figure 4. Added average beta values and number of probes per CpG island
If you have two or more region lists with coordinates on the same reference genome, you can compare them to identify overlapping regions.
Open all region list spreadsheets that you want to compare
Select Tools from the main toolbar
Select Find Region Overlaps (Figure 5)
Figure 5. Selecting Find Region Overlaps
The Find Region Overlaps tool has two modes of operation. The first, Report all regions, creates a new spreadsheet with any regions that did not intersect and all regions of intersection between any of the input lists. For each intersection, the start and stop coordinates of the intersection and the percent overlap between the intersected region with each of the regions in the input lists are reported. The second, Only report regions present in all lists creates a new spreadsheet with the intersected regions found in all the lists.
Select your preferred mode; we have selected Only report regions present in all lists
Select Add New Spreadsheet to add any spreadsheets you want to compare; we are comparing two region list spreadsheets (Figure 6)
Select OK
Figure 6. Configuring Find Overlapping Regions
A new region list spreadsheet will be created (Figure 7). The new region list is a temporary spreadsheet so be sure to save it if you want to keep it.
Figure 7. Spreadsheet with regions present in all lists
To be annotated using the Annotate SNVs tool, an imported SNV position list must have four columns per locus:
Position of the SNP listed as chr.basePosition
Sample ID or name
The reference base
The SNP call (sample genotype base)
Prepare input list as shown (Figure 8) with four columns describing the position, sample, reference base, and sample genotype base for each SNV
Figure 8. An imported SNV list must follow this format to be annotated by the Annotate SNV tool. The first column must be the position and the position must follow the format shown, chr.basePosition
Save as either a tab-separated or comma separated file
Import the table as a text file
Select Genomic data for What type of data is this file?
Set the position column Type to text
Set the other columns Type to categorical
Select Genomic location instead of marker IDs from the Choose the type of genomic data drop-down menu of the Configure Genomic Properties dialog
Specify the Species and Genome Build
Select OK
The Annotate SNVs tool can now be invoked on this spreadsheet to generate an annotation spreadsheet (Figure 9).
Figure 9. Annotate SNVs creates a new spreadsheet annotating each SNV from the source list
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
A BED (Browser Extensible Data) file is a special case of a region list: it is a tab-delimited text file and the first three columns of BED files contain the chromosome, start, and stop locations. To import a bed file to be used as a data region list, follow the import instructions for region lists. A BED File might also be visualized as an annotation file containing regions in the Genome Browser.
BED files do not contain individual sequences nor do the regions have names. For example, the UCSC Genome Browser has an annotation BED file for CpG islands. You might like to view this information in the context of a methylation microarray data set. Before you can visualize a BED file in the chromosome viewer, you must create a Partek annotation file from the BED file.
Select Tools from the main toolbar
Select Annotation Manager... (Figure 1)
Figure 1. Selecting Annotation Manager
Select Create Annotation from the My Annotations tab of the Annotation Manager dialog (Figure 2)
Figure 2. Creating a new annotation file
Select BED file (.bed) for Annotation Type (Figure 3)
Figure 3. Selecting annotation file type
Select Browse... under Source to specify the BED file; a default new file name and destination will populate Result, but this can be changed
You can specify the name and save location of the new annotation file under Result; we typically choose the Microarray Libraries folder
Specify the Name of the annotation database file
Select the correct Species and Genome Build for the annotation file from the drop-down menus (Figure 4)
Figure 4. Configuring annotation file creation
Preview Chromosome Names would be used if the original file had chromosome names that did not match the genome build that had required modification. For our example, this is unnecessary.
Select OK to create the annotation
The Annotation Manager will display the new annotation in the My Annotations tab (Figure 5)
Figure 5. Viewing created annotation in My Annotations
In order to use a BED file as an Annotation track in the Genome Browser, first create the annotation file as described above, being careful to specify the correct species and genome build.
Right-click a row on any spreadsheet that has genomic features on rows (gene lists, ANOVA results, SNP detection)
Select either Browse to Row or Browse to Location to invoke the Genome Browser tab
Select New Track from the Tracks panel of the Genome Browser (Figure 6)
Figure 6. Adding a new track to the Genome Viewer
Select Add an annotation track with genomic features from a selected annotation source from the Track Wizard dialog (Figure 7)
Figure 7. Track Wizard dialog
Select Next >
Choose the annotation file you created; here we have selected UCSC CpG Islands (Figure 8)
If your annotation file does not contain strand information for each region, deselect Separate Strands; here we have deselected it
Figure 8. Choosing the annotation file
Select Create
A new track will be created from the annotation file (Figure 9). If Separate Strands had been selected, there would be two tracks, one for each strand, like we see for the RefSeq Transcripts - 2014-01-03 (+) and (-) tracks (Figure 8).
Figure 9. Viewing the added annotation file as a track in the genome viewer
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
For Partek Genomics Suite to recognize an annotation spreadsheet, it must meet several requirements. First, there must be a column header row in the annotation file. Second, there must be a column in the annotation file that matches the identifiers in your data spreadsheet. Third, any text field above the column header row must start with #. Fourth, the text fields must be tab or comma delimited.
We will illustrate associating a spreadsheet with an annotation file using an imported .txt data file from an Illumina HumanHT-12 v4.0 Gene Expression BeadChip array and the HumanHT-12 v4.0 Whole-Genome Manifest File (TXT Format) from Illumina.
Open the annotation file with a text editor such as Notepad++/WordPad/TextEdit (Microsoft Excel is not recommended to edit text files, for instance when used default settings, it converts gene names to dates and floating-point numbers)
Microsoft Excel is not recommended for viewing text files because on default settings it converts some gene names to dates and others to floating-point numbers
Verify that a column in the annotation file matches the identifier in your data spreadsheet, e.g probe ID, the identifier must be unique to each row
Remove the text before the first column header (Figure 1) or add # to each text box
Save the annotation file as a .txt file
Figure 1. The HumanHT-12 v4.0 Gene Expression BeadChip annotation file contains several rows of information prior to the column header row. To use this annotation file in Partek Genomics Suite, we delete any rows prior to the column headers row.
Right-click the spreadsheet you want to annotate in the spreadsheet tree panel, select Properties from the pop-up menu (Figure 2) or select Properties from the File menu on the main toolbar
Figure 2. Changing the spreadsheet properties
Depending on how you imported the data, you may see a Configure Spreadsheet dialog (Figure 3). Select the most appropriate option for your data; here we have chosen Genomic microarray.
Figure 3. The Configure Spreadsheet dialog may appear depending on how you imported your data
The Configure Genomic Properties dialog will now open.
Select the appropriate option for Choose the type of genomic data; here we have chosen Gene Expression (Figure 4).
Figure 4. Selecting the type of genomic data
Select the appropriate options for Location of genomic features in spreadsheet
Selecting Gene Symbol instead of Marker ID allows biological interpretation tasks like GO Enrichment or Pathway Enrichment to be performed without an annotation file because the gene symbol can be used to look up the gene set or pathway database.
Location of genomic features in spreadsheet allows you to specify whether genomic features (e.g. genes, miRNAs, probes, SNPs, CpGs etc) are represented by columns or rows. For Feature in column label, each feature is on a column, each row is a sample. For Feature in column, each feature is on a row and the feature ID for each feature is located in the column chosen with the drop-down menu.
Choose chips/reference and annotation files allows you to specify an annotation file to associate with the spreadsheet.
Select Browse... from Choose chips/references and annotation files
Select your annotation spreadsheet file using the file selection interface
If the genomic position information from the annotation file cannot be automatically parsed, the Configure Annotation dialog will launch. This dialog allows you to choose which columns in the annotation file give the identity and genomic location of the features in your data spreadsheet. There are four options depending on if and how chromosome coordinates are described in the annotation file.
Select the appropriate option for your annotation file; we have selected Chromosome is in one column and the physical position is in another column (eg: chr1, 100 or chr1, 100-200)
The Choose the columns section displays the annotation file spreadsheet with options to choose which columns are the Marker ID,Chromosome, and Physical Position (Figure 4).
Select the column that matches the feature IDs in your data spreadsheet for _Marker ID; w_e have chosen Probe_Id for Marker ID.
Select the column(s) that matches the chromosome location data; we have chosen Chromosome for Chromosome and Probe_Coordinates for Physical Position.
Select Close to return to the Configure Genomic Properties
An index file for the genomic location data of the annotation file is generated in the same folder as the annotation file; it has the same file name as the annotation file, but the file extension .idx. If you need to re-configure the genomic location field in the annotation file, first manually delete the .idx file and re-do the above steps to generate a new index file for the annotation file.
Figure 5. Specifying the columns that contain the genomic locations of markers in the annotation file
The Chip/Reference text field will be populated with the annotation file name. You can edit this text field this if you wish.
For the Annotation column with gene symbols or miRNA names section, if Gene symbol instead of Marker ID is selected, this field is used automatically populated with the gene symbol column; however, if it is not selected, you will need to manually specify the column in the annotation file that corresponds with gene symbols or miRNA names.
Select Set Column:
Select the appropriate column from the dialog; here we have selected ILMN_gene (Figure 5)
Select OK
Figure 6. Choosing the annotation column with gene symbols
Species and gene symbol information is required for biological interpretation analysis.
Select the correct species and genome build from the drop-down menus; we have chosen Homo sapiens and hg19 (Figure 6)
Select OK apply the annotation file to your data spreadsheet
Figure 7. Choosing annotation file using the Configure Genomic Properties dialog
To verify that the annotation has been added, we can try to add annotation information to the spreadsheet when the feature are on rows in the spreadsheet.
Right-click on a column in the annotated data file spreadsheet
Select Insert Annotation from the pop-up menu (Figure 5)
Figure 8. Adding an annotation column to data spreadsheet
The Column Configuration section of the Add Rows/Columns to Spreadsheet dialog should contain all the feature annotations from the annotation file spreadsheet (Figure 6). Here we selected ILMN_Gene, which will add gene name information as a column next to 1. ID_REF.
Figure 9. Annotations from the annotation spreadsheet file should appear as options in the Column Configuration section of the Add Rows/Columns to Spreadsheet
Annotation files for most commercial arrays are available from the chip manufacturer. If you have a custom chip or want to use a customized annotation file, you can create an annotation file that will allow you to add annotations to your features (e.g. probe IDs) when the features are represented by rows on the spreadsheet. Your annotation file must meet the following criteria:
The annotation file must have a column header with a label for each column
A column in the annotation file must correspond to the feature ID column of your data spreadsheet
Any comments before the header must start with # or the header will not be recognized
The fields of the annotation file must be tab or comma delimited
To invoke a genome view of your data, your annotation file must also have one or more columns that contain the genomic location in a format that Partek Genomics Suite can recognize. The annotation file must also contain a column that has the chromosome and base pair location (start and stop or physical position). Cytoband and/or strand can also be included.
The table below provides possible column labels, a description of the format for that field, and an example.
Here are a few examples of the first two rows of annotation files:
Using Agilent format
Using Affymetrix SNPs format
Using Affymetrix exons format
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The method used to detect changes in functional groups is ANOVA. For detailed information about ANOVA, see Chapter 11 of the Partek User Manual. There is one result per functional group based on the expression of all the genes contained in the group. Besides all the factors specified in the ANOVA model, the following extra terms will be added to the model by Partek Genomics Suite automatically:
Gene ID - Since not all genes in a functional group express at the same level, gene ID is added to the model to account for gene-to-gene differences
Factor * Gene ID (optional) - Interaction of gene ID with the factor can be added to detect changes within the expression of a GO category with respect to different levels of the factor, referred to in this document as the disruption of the categories expression pattern or simply disruption
Suppose there is an experiment to find genes differentially expressed in two tissues: Two different tissues are taken from each patient and a paired sample t-test, or 2-way ANOVA can be used to analyze the data. The GO ANOVA dialog allows you to specify the ANOVA model, which includes the two factors: tissue and participant ID. The analysis is performed at the gene level, but the result is displayed at the level of the functional group by averaging of the member genes’ results. The equation of the model that can be specified is:
y = µ + T + P + ε
y: expression of a functional group
µ: average expression of the functional group
T: tissue-to-tissue effect
P: participant-to-participant effect (a random effect)
ε: error term
When the tissue is interacted with the gene ID then the ANOVA model becomes more complicated as demonstrated in the model below. The functional group result is not explicitly derived by averaging the member genes as the new model includes terms for both gene and group level results:
y = µ + T + P + G + T *G + ε
y: expression of a functional group
µ: average expression of the functional group
T: tissue-to-tissue effect
P: patient-to-patient effect (this can be specified as a random effect)
G: gene-to-gene effect (differential expression of genes within the function group independent of tissue type)
T*G: Tissue-Gene interaction (differential patterning of gene expression in different tissue types)
ε: error term
In the case that there is more than one data column mapping to the same gene symbol, Partek Genomics Suite will assume that the markers target different isoforms and will not treat the two markers as replicated of the same gene. Instead, each column is treated as a gene unto itself.
If there are only two samples in the spreadsheet then, Partek Genomics Suite cannot calculate a type by gene ID interaction. In this case, the result spreadsheet will contain a column labeled Disruption score. First, for each gene in the functional group Partek Genomics Suite will calculate the difference between the two samples. A z-test is used to compare the difference between each gene and the rest of the genes in the functional group. The disruption score is the minimum p-value from the z-tests comparing each gene to the rest in the functional group. A low disruption score therefore indicates that at least one gene behaves differently from the rest. This implies a change in the pattern of gene expression within the functional group and potential disruption of the normal operation of the group. The category as a whole may or may not exhibit differential expression in addition to the disruption.
Preparing a data set for analysis requires importing the data, normalizing the data as appropriate for standard gene expression analysis, and inserting columns containing the experimental variables. Checkout for more details about preparing data. It is not necessary to perform a differential analysis of gene expression before GO ANOVA.
For the sake of example, the following walkthrough will consider an experiment that has been imported which includes two different tissues, brain tissue and heart tissue, extracted from a small set of patients.
The GO ANOVA function is available in the Gene Expression, microRNA Expression, RNA-Seq, and miRNA-Seq workflows.
Select the Gene Expression workflow (or any of the other ones) from the Workflows drop-down on the upper right of the spread sheet
Go to the Biological Interpretation section of the workflow
Select Gene Set Analysis (Figure 1) and then Gene Set ANOVA
Figure 1. GO ANOVA dialog can be invoked via Gene Set Analysis option of the workflow
For this example analysis, the model was kept easy to interpret by including Subject and Tissue as the only ANOVA factors. Additionally, Tissue was added to the Disruption Factor(s). Including Subject controled for person to person variation, and including Tissue allowed the analysis of differential expression and of functional category disruption between tissue types. For the sake of simplicity and minimizing run time, the term Subject was not added to the Disruption Factor(s) box. Including it would have helped correct for subject specific gene expression patterns, though the results were largely unaffected in this case.
Performing GO ANOVA analysis on very large GO categories can take quite a bit of time. More importantly, very large categories may have too large a scope to be useful. To speed the operation and analyze only smaller GO categories, specify 20 genes as the maximum size for an analyzed GO category.
For the sample dataset, the GO ANOVA dialog setup should appear as in Figure 2 below.
Figure 2. GO ANOVA configured for the user guide data set. Two factors added to the model
GO ANOVA output is very similar to standard ANOVA output except each row in the resulting sheet contains statistical results from a single GO functional group rather than a single gene. Columns can be broken down into four sections:
Annotations contain detail about the category being considered
ANOVA results contain the significance of the effect of the factors in the model
Contrast results contain significance and fold change of the difference between groups compared via contrast
F-ratios display the significance of the factors in the ANOVA model
Annotations will take up the first four columns of the results sheet (Figure 1). The first column (# of genes) is the number of genes in the GO category. Specifically, this is not necessarily the number of unique genes in the category; depending on the technology, it can be the number of probes or probe sets on the microarray whose targets fall into the GO category. Genes targeted more than once will be counted more than once. The second column (GO ID) is the unique numeric identifier of the GO category; it is sometimes useful for searching with when the GO category has a very long name. The third column is the type of the GO category, while the fourth column (GO Description) is the name of the GO category.
Figure 1. GO ANOVA annotation columns (example)
When right click on any row header to choose Create Gene List , a new spreadsheet will be generated, it contains a list of genes (probes/probesets) within the selected GO category.
ANOVA results will include a column for each factor in the setup (Figure 2). A column with the name of the factor or interaction followed by p-value will contain how significant the effect of the variable is on the data. A lower p-value corresponds with a more significant effect. For example, a p-value of 0.1 for tissue means that given the difference between the tissue and the inherent variability of the measurements of the genes in the functional group, there is a 10% likelihood that the tissues are equivalent. A p-value of 0 occurs when the value is too small to be displayed. This can be caused by a very low estimate of inherent variability due to either a very small number of replicates or severely unbalanced data.
Figure 2. Viewing the GO ANOVA result
In the example experiment, a low p-value for tissue would imply the functional group is differentially expressed across tissues.
A low p-value for an interaction implies that the effect of one factor on the other is significant. In the example dataset, no interactions between two main variables were included as factors. To illustrate what the interaction p-value would mean, consider the case that a drug compound and a control injection were dosed over several time points and an interaction between injection compound and time point was included in the GO ANOVA. A low p-value for the drug-time point interaction corresponds to the effect of drug on the functional group being altered with time.
A column will also be present for each factor placed in the Disruption Factor(s) box. This column will have the header Disruption(Factor name). A low p-value in this column corresponds to the different states presenting with different gene patterns within the functional group. For functional groups containing only a single gene, no value will be present as the pattern cannot change. In the example experiment, a low p-value for the Disruption(Tissue) represents function categories which have different genes operating in the heart and in the brain.
Contrast results include four columns for each of the comparisons declared during GO ANOVA setup. The first column contains the p-value representing the significance of the difference between the two categories. The second column contains the ratio between the two groups where increases are represented as greater than one and decreases are represented as values between zero and one. The third column is the fold change of the functional group between the two categories where increases are greater than one and decreases are less than negative one. The fourth column contains a plain text description of the direction of the fold change. Fold changes and ratios represent the average change in the functional category. In the example, a contrast was run comparing expression in the cerebral tissue to the heart tissue (Figure 3). As these were the only tissues, the p-values are identical to those in column 5. While the p-value column shows which groups are differentially expressed between the tissues, the fold change columns allow us to see by how much they are differentially expressed. Using the sign of the fold change, or the description column, you can see which categories are increased in brain and which are increased in heart.
Figure 3. Viewing the GO ANOVA contrast columns
F-Ratios
F-ratios (Figure 4) are used in the computation of p-values. The values in the columns can safely be ignored by most users; there are exceptional cases when the F-ratios may be informative. To see the general significance of the factors included in the model, a Sources of Variation plot can be computed from these values from the View menu (or the Workflow). The higher the average F-ratio, the more important the factor is to the model on average.
Figure 4. Viewing the GO ANOVA F-ratios
The setup dialog for GO ANOVA can be found in the Biological Interpretation section of the expression workflows (Gene Expression, MicroRNA Expression, Exon, RNA-Seq, miRNA-Seq). It is recommended that GO ANOVA is run on the sheet with expression levels, after import and normalization, though GO ANOVA can be run on any spreadsheet with samples on rows and genes on columns. If a child spreadsheet is selected, such as the result of a prior ANOVA analysis, then the test will be automatically run on the parent spreadsheet.
Upon selecting GO ANOVA (Biological Interpretation > Gene Set Analysis), Partek Genomics Suite will first offer the opportunity to configure the parameters of the test and exclude functional groups with too few or too many genes (Figure 1). To save time when running GO ANOVA, the size of GO categories analyzed can be limited using the Restrict analysis to function groups with fewer than __ genes. Large GO categories may be less interesting and also take the most time to analyze. We recommend to restrict the analysis to the groups with fewer than 150 genes, as it can make the analysis much quicker (and the results easier to interpret). In the current example, the maximum category was set to only 20 genes, for demonstration purposes only.
Figure 1. Configure the parameters of the test: gene ontology categories with too few or too many genes can be excluded
Figure 2. Setting the method of mapping genes to gene sets
To setup the GO ANOVA dialogue you must consider all factors that would normally be included in an ANOVA model analyzing gene expression among the samples (Figure 3). Briefly this should include:
Experimental factors
Factors explaining sample dependence
Factors explaining noise
For more details on ANOVA, see Chapter 11 of the User’s Manual.
Figure 3. GO ANOVA setup dialog. Including a factor in the ANOVA model (ANOVA Factors) will identify gene ontology (GO) categories whose expression is different across the genes within the category, by the factor of interest. Including a factor as a Disruption Factor will identify GO categories where the expression of the genes within the category are affected but not uniformly across the genes withing the category. Genes (probesets) can be excluded based on expression levels, to reduce the noise.
Factors inherent to the experiment include variables that would be considered as the experimental variables during experiment design. Generally this will include all variables necessary to answer the questions of the researcher. Examples may include factors such as tissue type, disease state, treatment, or dosage.
Sometimes factors do not act independently of each other. For example, different dosages of a drug may affect patients differently over time, or a drug may not affect tissues equally as in many toxicity studies. If the effect of one variable on the other is either suspected of occurring, or of particular interest, an interaction between the two factors should be included. To do this, select the two factors simultaneously by CTRL-clicking the factors and then select Add Interaction.
Factors to control for sample dependence include variables that account for relation between samples. If tissues are collected in pairs from the same patient, patient ID would be included. Similarly if tissues are collected from two distinct populations, this variable should probably be included as well.
Noise variables may be caused by technical processes used during sample collection and processing. Scan data and dye color are often among these variables.
Factors included in the GO ANOVA fall into two separate categories: the normal ANOVA factors (middle box) and those interacting with the gene (right-side box).
Fundamentally, you can run the GO ANOVA with the same parameters used to run a standard ANOVA analysis on gene expression data. (In other words, the middle box of the GO ANOVA is populated exactly as the normal ANOVA and the Interact with Gene box is left empty.) If such an analysis is run, the results would be similar to a standard statistical analysis, except resulting data will report on differential expression of functional categories instead of individual genes. Expression of a functional group is derived from the mean of all genes included within the group. Running GO ANOVA with the same parameters as the differential expression analysis is the most common method of running GO ANOVA. This keeps the analysis much more accessible and the results are easier to interpret.
There is no need to interact a factor with the gene if such an interaction is not of interest. The right most box in the GO ANOVA setup is optional and may be left empty if this is the case.
More advanced analysis can include factors, which are interacted with the genes in the GO ANOVA model. After factors are added to the ANOVA factor(s) box, some can be added additionally to the Disruption Factor(s) box. At the mathematical level, this will include the Factor*Gene term in the model, called a Factor-Gene interaction. At the biological level, this will test whether patterns of gene expression within the functional group are being modified as a result of the factor. This altering of gene expression patterns is referred to in this document as the disruption of the functional group.
For example, if comparing different tissue types, adding tissue to the middle ANOVA factor(s) box, will identify entire GO functional groups that are up or down regulated between tissue types. If comparing nerves and muscles, this might include such categories as myosin binding or actin production, which will be wholly up regulated in muscles as the function is much less important to nerve function.
By interacting tissue with the gene in the model (adding tissue to the right most box), the interaction p-value may provide a method of discovering categories where total expression might not changed significantly but the pattern of gene expression with the category is altered or disrupted. Within a functional group, the interaction p-value represents how similar the patterns of gene expression are between the different tissues. One example of a functional group identified by a tissue*gene interaction might include a category such as ion transfer. Ion transfer is equally important to both nerve and muscle function, but the distribution of ion channels and many of the responsible genes may be quite different between the two.
Sometimes factors may be included in the Interact with Gene box even if they are not of specific interest in a similar way that factors to control for noise are added to the ANOVA factors middle box. If any factors are included in Disruption Factor(s) box, to get the most accurate p-values, the more advanced model must fit the data as well as possible. All factors that may alter gene expression patterns should be included. It is important to keep in mind that the GO ANOVA is not only looking for significance in the factors included, but is attempting to generally fit the data. As appropriate factors are added to the model, not only are more aspects of the data analyzed; the model becomes a better fit to the true data and the results will become more accurate.
To understand how including a Gene*Factor interaction may improve the fit of the model, consider the complex GO ANOVA design in the case of a dose-time analysis of a drug. While it may seem clear that the ANOVA factors in the middle box - dose, time, and the dose*time interaction should be specified (to consider the effect of dose, time, and the change in the effect of dose over time) what to put in the rightmost Gene*Factor box is not as clear. Adding dose alone (which is actually Dose*Gene) will check if different drug doses affect the pattern of gene expression. Similarly adding time into the right box (which is actually Time*Gene) will identify gene ontology categories that are affected in different times but differentially across the genes. While this may be the true limit of questions of interest, including the interactions of the gene and both dose and time may be prudent. In general, if it is likely, or expected, that a factor will affect gene distribution within functional categories, then the factor should be included in the Disruption Factor(s) box if the gene distribution is being analyzed at all.
To review, including a factor in the middle box will identify GO categories whose expression is consistently affected across the genes within the category by the factor of interest. Including a factor in the right box (factor*gene) will identify gene ontology categories where the expression of the genes within the category are affected but not uniformly across the genes within the category.
GO ANOVA is not restricted to analysis of factors with only two levels. The ANOVA p-values are measures of likelihood that all groups are equivalent. While this is useful in general, sometimes tests comparing only two sets of data are more desirable. Using contrasts to define pair wise comparisons in an ANOVA model is superior to using a test that is limited to a two group comparison.
To specify individual pair wise comparisons, press the Contrast button. Contrasts are performed on groups already defined in the ANOVA model. If two tissue types should be compared to each other, select the tissue term from the Select Factor/Interaction dropdown in the upper left. Select either one or a set of categories and add them to group 1 and group 2. All samples falling into group 1 will be compared to all samples falling into group 2. Output will include not only a p-value, but also a fold change. This fold change will represent the average fold change of the GO category between the two groups. Fold change is calculated as Group 1 divided by Group 2. For data in log space, the data is antilogged as well; fold change output is always for data on a linear scale.
Check Exclude probe sets and differential expression p-value(s) > to filter out probe sets (=genes) which are not express in any of the genes. The Exclude probe sets option will remove any gene that meets the specified limit. Using the default options, this will remove low expression genes. Note that the default value of 3 is a suggestion for Affymetrix expression arrays and may not be applicable for other data sets. We suggest to perform exploratory analysis and inspect the distribution of the expression values first (e.g. View > Histogram > Row or View > Box and Whiskers > Row). The sub-checkbox, differential expression p-values, provides an override to the low expression limit. Here, a gene will be included in the analysis despite a low expression value if the gene displays a p-value below the specified limit, suggesting that the gene is differentially expressed
When looking for simple differential expression, sorting by ascending on the factor p-values is ideal. This will find groups that are the most significantly apart across all the contained genes. In the interest of finding groups that are less likely to be called by chance, it may be wise to filter to groups with a minimum of 4 or 5 genes (Figure 1). Simple filters can be done using the interactive filter () available from the button on the toolbar at the top of the screen.
If there is more than one factor in the model, more complex criteria combining the factors can be specified using Tools>List Manager menu Advanced tab. For example, to find categories that are significant and changed by at least two fold, make two criteria: one for a low p-value and the other for a minimum of two fold change, and take the intersection of the two criteria.
Figure 1. Top ten functional groups sorted by the Tissue p-value after filtering to a minimum five gene in the GO category. Note that most of the groups can be directly related to the heart muscle
If the disruption (factor*gene interaction) is tested, the filters can become more complicated. The most pressing need for complex filters is that when analyzing larger functional groups it is not expected that the entire functional group will behave the same. Looking back at Figure 1, notice how the low values in column 7 are present because not every gene is equally differentially expressed even in the most differentially expressed of groups. That is, when there is significant differential expression, it is likely that there will also be disruption as at least a single gene is likely participating in a role beyond that of the functional group and will not follow the pattern of the rest of the group. This situation is expected and leads to a new type of filter.
Filtering for low p-values on the factor and then filtering for low p-values on the factor interacted with gene will find groups that are differentially expressed, but contain at least a few genes that are either disrupted due to treatment, or simply are involved in additional functional groups beyond the scope of the one being analyzed. This list often contains some of the more informative big picture functional groups.
Figure 2. Top ten functional categories sorted by Disruption(Tissue) p-value after filtering to a minimum of five genes in the GO category. By prioritizing by the disruption column this type of a list is more "big picture"
If looking for disruption for groups which are not so much differentially expressed, but instead which express different genes for different treatments, filter for low disruption p-values but for high factor p-values. As shown by Figure 2, large or diverse groups that are differentially expressed will often exhibit significant disruption. In fact, a group that is differentially expressed but includes even a single gene that is not changed will have very significant disruption. These situations are certainly notable, but are distracting if looking for functional groups that instead are uniquely patterned based on treatment. By filtering out those groups with low p-values for the factor and then looking at the remaining groups with low p-values for disruption, groups observed have usually very distinct patterns of expression (Figure 3).
Figure 3. Top ten functional categories sorted by Disruption(Tissue) p-value after filtering to a minimum of five genes in the GO category and minimum Tissue p-value of 0.3. This list is especially interesting, as using enrichment alone to detect such categories would require a lot of labour.
There are two main visualizations for use with GO ANOVA outputs:
Dot plots used to visualize differential expression of functional groups
Profile plots used for visualizing disruption of gene expression patterns within the group
Dot plots represent each sample with a single dot. The position of each dot is calculated as the average expression of all genes included in the functional group. Invoke this plot by right clicking on the row header of a functional group of interest and choosing Dot Plot (Orig. Data). The color, shape, and size of the dots can be set to represent sample information in the plot properties dialogue, invoked by pressing on the red ball in the upper left.
Figure 1 shows a dot plot for a GO category "cell growth involved in cardiac muscle cell development", which is expressed in the heart at a level of almost four times that of the brain, evidenced by the difference of just under two units on the y-axis (in the current example the values on the y-axis are shown in log2 space). Note that the replicates are grouped neatly, making this category highly significant. That is not a surprise, given that the genes belonging to that category are likely very specific for the heart.
Figure 1. Dot plot of a significantly differentially expressed GO category. Each dot is a sample, box-and-whiskers summarize groups
Profile plots or profiles represent each category of one of the GO ANOVA factors as a few overlapping lines. Horizontal coordinates refer to individual genes or probes in the original data. Vertical coordinates represents expression of the individual gene. Invoke this plot by right clicking on the row header of a function group of interest and choosing Profile (Orig. Data). This plot is useful as the pattern of gene expression in the group is displayed as a line. If the pattern is conserved across treatments, the lines will lie parallel, but if the gene reacts differently, the lines will follow a different pattern, maybe even cross each other.
Profile plot on Figure 2 visualizes a GO category without differential expression, but with significant disruption. Note that the gene TNNI3 is up-regulated in the heart, while STX1A is down-regulated in the heart.
Figure 2. Profile plot of a GO category with significant disruption but not differential expression. Each data point is a gene (error bars are standard error of the mean)
Column label | Description of format | Example |
---|---|---|
ProbeID | GeneName | GenomicCoordinates | Cytoband |
---|---|---|---|
Probe Set ID | Chromosome | Physical Position | Strand | Cytoband |
---|---|---|---|---|
probeset_id | seqname | strand | start | stop |
---|---|---|---|---|
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The next dialog (Figure 2) specifies the method of mapping genes to gene sets. Default mapping file is built from annotation files from . Custom mapping file points to the mapping files available on the local computer and present in the Microarray libraries directory. Create a new mapping file from the chip's annotation file option will try to build the annotation file from the annotation file created by the microarray vendor. Create a new mapping file from a spreadsheet enables you to create a custom mapping file from an open spreadsheet, which has gene symbols on one column, and gene groups on the other column. Finally, files in gene matrix transposed (GMT) or gene annotation (GA) formats can also be used.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
chromosome
a chromosome label
3
start
an integer, the start position (in base pairs) of the feature
69871322
stop
an integer, the stop position (in base pairs) of the feature
70100176
genomic_coordinates
chromosome:start-stop
3:69871322-70100176
strand
+ for top, - for bottom
+
physical position
an integer, the position (in base pairs) of the feature
70100176
A_44_P1025812
TC521361
chr12:2546883-2546824
rn
SNP_A-1512540
9
22205296
-
p21.3
2315588
chr1
+
1155398
1155624
The volcano plot displays p-values and fold-changes of numerous genomic features (e.g., genes or probe sets) at the same time. This allows differentially expressed genes to be quickly identified and saved as a gene list.
Note: the same list can be generated without a visual aid using the List Manager (ANOVA Streamlined tab).
We will invoke a volcano plot from an ANOVA results child spreadsheet with genes on rows.
Select View from the main toolbar
Select Volcano Plot (Figure 1)
Figure 1. Invoking a volcano plot on an ANOVA results spreadsheet
The Volcano Plot Configure dialog will open (Figure 2).
Figure 2. Select the columns to display in the volcano plot
Select the fold-change and p-value columns you would like to visualize from the ANOVA results spreadsheet; here we have chosen 12. Fold-Change(Down Syndrome vs. Normal) for the X Axis and 10. p-value(Down Syndrome vs. Normal) for Y Axis
Select OK
The volcano plot will open in a new tab (Figure 3). Control and color options for the volcano plot are largely similar to those described for a dot plot. On volcano plots with many probe(sets)/genes, the shapes and sizes of individual probe(sets)/genes will not be visible until they are selected.
Figure 3. The volcano plot shows each probe(set)/gene as a point. The X Axis shows fold change with no change (N/C) as the mid-point. The Y Axis shows p-values in descending value from a maximum of 1 at the X Axis intersection.
To facilitate analysis, we can add cutoff lines for both fold-change and p-value.
Select ()
Select the Axes tab
Select Set Cutoff Lines (Figure 4)
Figure 4. Adding cutoff lines to the volcano plot
Set Vertical Line(s) to 1.3 and -1.3
Set Horizontal Line(s) to 0.05
Select Select all points in a section
Select OK (Figure 5)
Figure 5. Setting cutoff lines. The vertical lines are fold-change cutoffs. The horizontal line is a p-value cutoff.
Select OK to close the Plot Rendering Properties dialog
The volcano plot now has cutoff lines for fold-change and p-value (Figure 6).
Figure 6. Cutoff lines facilitate visual analysis of ANOVA results
Because we selected Select all points in a section when adding the cutoff lines, selecting any of the quandrants will select all probe(sets)/genes in that quadrant. If this option is not selected, individual probe(sets)/genes or groups can be selected using selection mode. Gene lists can be generated from selected probe(sets)/genes.
If columns are selected in the ANOVA results source spreadsheet for the volcano plot, only those columns will be included in the created list.
Select the upper right-hand quadrant of the volcano plot
Right click the selected quadrant
Select Create List (Figure 7)
Figure 7. Creating a gene list from a volcano plot
Give the new list a name and description as appropriate
Select OK
The list will be saved as a text file and open as a child spreadsheet in the Analysis tab.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
A scatter plot is a simple way to visualize differentially expressed genes. We can plot a scatter plot with gene expression values for two samples at one time. While most probe(sets)/genes fall on a 45° line, up- or down-regulated genes are positioned above or below the line.
To draw a scatter plot, you first need to transpose the original intensities spreadsheet so that the samples are on columns and probe(sets)/genes are on rows.
Select the main spreadsheet
Select Transform from the main toolbar
Select Create Transposed Spreadsheet...
Select the column with sample IDs from the drop-down menu
Select OK
A new temporary spreadsheet will be created with probe(sets)/genes on rows and samples on columns.
Select the two sample columns you would like to compare
Select View from the main toolbar
Select Scatter Plot (Figure 1)
Figure 1. Invoking a scatter plot from a spreadsheet with probe(sets)/genes on rows and samples on columns
Select Yes when asked if you want to only use the selected columns
Select Yes when asked if you are sure you would like to draw the scatter plot
The scatter plot will open in a new tab. We can add a regression line to the plot.
Select () from the plot command bar
Select Axes
Select Set Regression Lines
Select Regression line of y on x
Set Line Width to 5
Select OK (Figure 2)
Figure 2. Configuring a regression line
Select OK to close the Plot Rendering Properties dialog
The scatter plot now features a regression line dividing the probe(sets)/genes (Figure 3).
Figure 3. Each dot on the plot represents the intensity value of a probe(set)/gene
The MA plot can be used to display a difference in expression patterns between two samples. The horizontal axis (A) shows the average intensity while the vertical axis (M) shows the intensity ratio between the two samples for the same data point. In essence, an MA plot is a scatter plot tilted to the side so that the differentially expressed probe(sets)/genes are located above or below the 0 value of M. An MA plot is also useful to visualize the results of normalization where you would hope to see the median of the values follow a horizontal line.
The MA plot is invoked on the original intensities spreadsheet with any need for transposition.
Select View from the main toolbar
Select MA Plot
The MA plot will launch in a new tab showing the first two rows as the comparison (Figure 4).
Figure 4. MA plot comparing the expression levels between two samples. Each dot on the plot represents a single genomic feature (gene or probe set). The average signal for each genomic feature is shown on the horizontal axis (A), while the ratio is shown on the vertical axis (M).
The samples displayed can be changed using the select sample menus on the left-hand side.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The XY plot / bar chart displays the intensity of one probe(set)/gene across two categorical variables. Only one probe(set)/gene may be visualized at a time.
We will invoke an XY plot from a gene list child spreadsheet with genes on rows. The parent spreadsheet should include the categorical variables you want to chart.
Right-click on the row header of the gene you want to visualize
Select XY Plot (Orig. Data) from the pop-up menu (Figure 1)
Figure 1. Invoking an XY Plot from a gene list child spreadsheet
An XY plot will be displayed in a new tab (Figure 2).
Figure 2. By default, an XY plot invoked from a gene list will have the first categorical variable as columns and the second categorical variable as shapes/colors
To display the change in gene expression over time for each treatment condition, we need to modify this plot.
Select () from the plot command bar
Set X-Axis to 3. Time using the drop-down menu
Set Separate by to 2. Treatment using the drop-down menu
Select OK
To help visualize the connection between time points, we can add connecting lines.
Select () from the plot command bar
Set Plot Style to lines using the drop-down menu
Select OK
The plot now shows time on the x-axis, plots treatments, and connects treatments across time points with lines (Figure 3). Each point is the LS mean value of all samples with the same values for the two selected categorical variables. The error bars are standard error.
Figure 3. Modifying the XY plot to enable analysis of gene expression changes in a treatment condition over a time course. In this experiment, only the control was measured at time 0.
While most of the plot controls are shared with the dot plot, XY plot does have a few unique options.
Select () to automatically cycle through each row (gene) in the source spreadsheet
Select () to stop the cycling
This feature is useful when performing visual analysis of patterns in gene expression changes in a list of genes.
The drop-down menu adjacent to the previous/next () controls lets you switch source spreadsheets.
Lines, but not points, can be selected when using Selection Mode ().
It is also possible to invoke an XY plot from the parent spreadsheet using the main toolbar.
Select the parent spreadsheet in the spreadsheet tree
Select View from the main toolbar
Select XY Plot / Bar Chart ...
The Create XY Plot / Barchart dialog will open (Figure 4).
Figure 4. Invoking an XY Plot from the main toolbar
An XY plot will be displayed in a new tab (Figure 5).
Figure 5. The gene name associated with the probe(set) column is displayed as the chart title by default
Selecting previous/next () will nagivate along either rows or columns, whichever has probe(set)/gene information.
To switch this plot from to one of the gene lists we have created, we can use the drop-down menu next to the previous/next controls.
The displayed by a XY plot can instead be displayed as a bar chart with overlayed bars, vertically stacked bars, or horizontally stacked bars. A bar chart can be directly invoked or an XY plot can be converted into a bar chart (and vice versa).
Invoke the plot from a gene list using the Bar Chart (Orig. Data) option in the pop-up menu (Figure 1)
Invoke the plot from the main toolbar by selecting one of the bar chart options in the Line Style drop-down menu (Figure 4)
Invoke the plot as an XY plot, select (), then select one of the bar chart options from the Plot Style drop-down menu in the Plot Rendering Properties dialog (Figure 6)
Figure 6. An XY Plot can be converted to a Barchart using the Plot Rendering Properties dialog
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The profile plot displays probe(set)/gene intensity values across samples and genes.
We will invoke a profile plot from a gene list child spreadsheet with genes on rows.
Select the rows to be visualized
Right-click on a row header of one of the selected rows
Select Profile Plot (Orig. Data) from the pop-up menu (Figure 1)
Figure 1. Selecting Profile Plot for selected rows
The profile plot will be displayed in a new tab (Figure 2). Lines are probe(sets)/genes and columns are samples from the parent spreadsheet.
Figure 2. Basic profile plot. Each line represents a different prob(set)/gene; each column represents a sample from the parent spreadsheet
A basic profile plot will likely need customization. The plot configuration, properties, and control options are the same as shown for a dot plot. We will illustrate a few modifications here.
We can change the row labels to show each sample ID.
Select ()
Select the Axes tab
Set Grid to 1
Select Rotate X-Axis Labels and set to 90 degrees (rotates counter-clockwise)
Set Label Format to Column and select 5. Subject
We can add symbols to show which group each sample belongs to.
From the Shape by drop-down menu, select 3.Type
Select OK
Symbols have now been added to each profile line plot (Figure 3).
Figure 3. The profile plot can be modified to facilitate analysis or presentation
Note that samples present on the parent spreadsheet cannot be excluded from the profile plot. To plot only a subset of the samples you must filter the parent spreadsheet.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The primary use of the dot plot is visualizing intensity values across samples.
We will invoke a dot plot from a gene list child spreadsheet with genes on rows.
Right-click on the row header of the gene you want to visualize
Select Dot Plot from the pop-up menu (Figure 1)
Figure 1. Creating a dot plot of gene intensity values
A dot plot will be displayed in a new tab (Figure 2).
Figure 2. Simple dot plot of a single gene that shows the distribution of intensities across all samples
There are many customizations that can be made to this simple plot.
Select Configure Plot () from the plot command bar to launch the Configure Plot dialog (Figure 3).
Figure 3. Configuring the data shown on the plot
The Configure Plot dialog lets you change how the data is displayed on the plot. We will make a change to illustrate the possibilities.
Set Group by to 4. Tissue using the drop-down menu
This allows us to group the samples by any categorical attribute. These attributes are specified in the parent spreadsheet.
Select OK to modify the plot
We could also have changed the grouping of samples using the Group by drop-down menu above the plot.
The order of the group columns is alphabetical by default, but can be changed to match the spreadsheet order by selecting Categoricals in spreadsheet order in the Configure Plot dialog (Figure 3).
Select Plot Properties () from the plot command bar to launch the Plot Properties dialog (Figure 4)
Figure 4. Changing the appearance of a dot plot using the plot properties dialog
The Plot Properties dialog lets you change the appearance of the plot. We will make a few changes to illustrate the possibilities.
Set Shape to 3. Type using the drop-down menu
Select the Box&Whiskers tab
Set Box Width to 15 pixels
Select the Titles tab
Set X-Axis under Configure Axes Titles to Tissue
Select OK to modify the plot
Alternately, we chould have changed the shapes using the Shape by drop-down menu above the plot. The dot plot now shows four columns with thinner box and whisker plots for each and different shapes for different sample types (Figure 5).
Figure 5. The Dot Plot can be modified to optimally visualize your data
Like many visualizations in Partek Genomics Suite, the dot plot is interactive.
Select () to activate Selection Mode
Legends can now be dragged and dropped to new locations on the plot. Samples can be selected by left-clicking the sample or left-clicking and dragging a box around samples.
Select () to activate Zoom Mode
Left clicking on a region will zoom in on it. The zoom level can be reset by selecting ().
After zooming in, select () to activate Pan Mode
Left-click and drag to move around the plot.
Select () to move between rows on the source spreadsheet
Select () to swap the horizontal and vertical axes
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This user guide illustrates:
This user guide assumes the user is familiar with the hierarchy of spreadsheets and analysis in Partek Genomics Suite.
Many plots available in Partek Genomics Suite are not discussed in this user guide. A more thorough review of Partek Genomics Suite visualizations can be found in Chapter 6: The Pattern Visualization System of the Partek User's Manual available from Help > User’s Manual in the Partek Genomics Suite main toolbar.
There is no specific data set for this tutorial. You may use one of your own microarray experiments or use a data set from one of our tutorials.
Visualizations are generated using data from a spreadsheet. Some visualizations allow interactive filtering on the plot, but others do not. If you only wish to include certain rows or columns in a visualization, you may need to create a spreadsheet with only the rows or columns of interest by applying a filter and cloning the spreadsheet.
In general, probe(set)/gene intensity values may be visualized from either an ANOVA spreadsheet or a filtered ANOVA spreadsheet. Because intensity data is stored in the parent spreadsheet, the parent and child spreadsheets should be visible in the spreadsheet navigator with the appropriate parent/child relationship (Figure 1).
Figure 1. Down_Syndrome-GE is the parent spreadsheet; ANOVAResults and A are child spreadsheets of Down_Syndrome-GE
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Manhattan plot is a common way to visualize p-values or log-odds ratios for GWAS studies across genomic coordinates.
The starting point for a Manhattan plot is a spreadsheet with SNPs on rows and p-values or log-odds ratios in a column. If beginning with p-values, you will need to convert the p-values to -log10(p-value).
Select the column with p-values
Select Transform form the main toolbar
Select Normalization & Scaling
Select On Columns...
In the Normalization tab, set Base of the Log(x + offset) to 10
Select OK
Go to Transform > Normalization & Scaling > On Columns... again
Select the Add/Mul/Sub/Div tab
Set Multiply by Constant to -1
Select OK
The column now contains -log10(p-value).
We can now invoke the initial plot.
Select View from the main toolbar
Select Genome View
The Genome View tab will open. This plot will need to be configured.
Select the Profiles tab
Remove any unwanted profiles
Select Add profile
Select Column
Select the column with the -log10(p-value) or logs-odds ratio values from the drop-down menu
Select Value for Color by
Select point from the Style drop-down menu
Select OK to add the profile
Select OK to close the Configure Plot Properties dialog
The plot will now show a Manhattan plot (Figure 1).
Figure 1. Customized Genome View showing genomic locations on the x-axis and -log10(P-values) of SNPs on the y-axis (Manhattten plot). Each dot represents a single SNP. The Cytoband is shown along the bottom of the plot
It is also possible to display multiple chromosomes at the same time.
Select Show All in the upper-right hand corner of the plot
This displays all chromosomes vertically. We can display them horizontally for a better view.
Select Genome in line for Layout
Select OK
To further improve the genome-wide view, we can remove the cytoband, remove the genomic position label, color points by chromosome, and increase point size.
Select Cytoband in the upper right-hand corner
Select the Axes tab
Deselect Show Base Pair Labels
Select Profiles
Select Configure
Set Color By to a column with chromosome for each SNP/loci as a category
Set Shape Size to 5.0
Select OK to close the Configure Profile dialog
Select OK to apply changes
The plot will appear as shown (Figure 2).
Figure 2. Full genome Manhattan plot
For details on Genome View see Chapter 6: The Pattern Visualization System in the Partek User's Manual.
Sort Rows by Prototype is a function that can identify genes with similar expression patterns. For example, if a gene with an interesting expression pattern has been detected, using Sort Rows by Prototype makes it possible to find other genes that have a similar pattern of intensity values. Although this is most commonly used for changes in gene expression over a time course, it can be applied to other experimental designs as well.
To invoke Sort Rows by Prototype_,_ probe(sets)/genes must be on rows. If you want to use this tool to analyze the main intensity values spreadsheet, the spreadsheet must be transposed prior to analysis. A common way to view and analyze gene expression in a time-series experiment is to include means or LS means in the ANOVA spreadsheet.
Configure the ANOVA dialog to include the factor or interaction of interest
Select Advanced... from the ANOVA dialog
Select LS-Mean or Mean
Use the drop down menus to select the factors or interaction you want the LS mean / mean of
Select Add for each
Select OK (Figure 1)
Figure 1. Using Advanced ANOVA setup to include group means in the ANOVA output
Select OK to close the ANOVA configuration dialog and open the ANOVA spreadsheet
The Sort Rows by Prototype function uses every non-text column in a spreadsheet to build and compare patterns; any columns you do not want to include in the pattern similarity analysis need to be removed before running the function.
If you want to preserve the ANOVA spreadsheet contents, clone the ANOVA spreadsheet prior to deleting columns.
Select columns you want to remove
Right-click on a selected column headers
Select Delete from the pop-up menu
We can now invoke Sort Rows by Prototype on the modified spreadsheet.
Select Tools from the main toolbar
Select Discovery
Select Sort Rows by Prototype... (Figure 2)
Figure 2. Invoking Sort Rows by Prototype on spreadsheet with LS mean values for conditions/time points
The Sort Rows by Prototype dialog will launch (Figure 3).
Figure 3. Sort Rows by Prototype dialog
This dialog allows you to configure the pattern, or prototype, that all probe(sets)/genes will be compared to by Sort Rows by Prototype_._
The Select Dissimilarity Measure drop-down menu allows to select from a wide variety of parametric and non-parametric measures of dissimilarity.
After configuring the prototype and selecting a dissimilarity measure, select Sort to run the function
Select Cancel to close the dialog
A new column 1 will be added to the spreadsheet and the rows will be reordered (Figure 4). The new column contains the dissimilarity score for each row; the lower the value, the more similar the row is to the prototype. The row with the highest similarity to the prototype is listed first, with the other rows listed in descending similarity to the prototype.
Figure 4. Result of sorting by prototype. The prototype gene is in the first row, while the other genes are listed based on their similarity to the prototype gene. Smaller proximity values imply more similarity to the selected shape
To view the results, we can generate a profile plot of several of the rows. For example, here we will show the top five most similar probe(sets)/genes.
Select the row headers of the top 5 rows by selecting each while holding the Ctrl key or selecting the first then fifth while holding the Shift key
Select View from the main toolbar
Select Profiles
Select Row Profiles
Select Select for both Plots and X-Axis in the Configure Data Source dialog
The profile plot will open as a new tab (Figure 5).
Figure 5. Profile plot of 5 probe(sets)/genes most similar to the prototype used in Sort rows by prototype
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () from the plot command bar
Select to open the Configure Plot dialog
Select
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () from the main command bar to save the modified spreadsheet
The Pattern Type options () allow preset shapes to be applied to the prototype within the range specified by the Begin, End, Min, and Max parameters. The final option From Row allows you to select any row number in the spreadsheet to serve as the prototype. This is a useful option if you have a particular gene of interest and want to find other genes with similar expression profiles in your data set. You can also manually configure the prototype by dragging the points.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Violin plot in Partek Genomics Suite is similar to the Profile Trellis plot in that it displays probe(set)/gene intensity values across samples and genes. However, the Violin plot has additional options not shared by the Profile Trellis plot. Here, we will explore one use case for the Violin plot.
For this example, we will use the data set and lists created in the Gene Expression tutorial. We have a list of 23 genes that are differentially regulated in tissue samples from patients with Down syndrome and normal controls. We want to display the mean intensity values for Down syndrome and normal samples for each of the 23 genes on a single plot. To do this, we first need to filter the probe intensities spreadsheet to include only the intensity values for the 23 genes of interest.
With the probe intensities spreadsheet and the gene list open in the Analysis tab, follow these steps to filter the probe intensities spreadsheet.
Select the probe intensities spreadsheet in the spreadsheet tree; here, it is Down_Syndrome-GE
Select Filter from the main task bar
Select Filter Columns
Select Filter Columns Base on a List... (Figure 1)
Figure 1. Invoking filter columns by a list
The Filter Columns dialog will open (Figure 2).
Figure 2. Configuring the Filter Columns dialog to filter by probe set ID
Select your gene list from the Filter base on spreadsheet drop-down menu; here, we selected Down_Syndrome_vs._Normal
Select the column of your gene list that matches the column IDs you want to filter from your probe intensities spreadsheet; here, we selected 2. Probeset ID
Select OK to apply the filter
A black and yellow horizontal bar will appear at the bottom of the spreadsheet. This is the filter indicator showing the proportion of columns (genes/probesets) filtered out (black) and retained (yellow). To continue working with the filtered probeset intensities, we can clone the filtered spreadsheet.
Right-click on the filtered probe intensities spreadsheet in the spreadsheet tree
Select Clone... from the pop-up menu (Figure 3)
Figure 3. Cloning a spreadsheet with a filter applied will clone only the retained rows/columns
Name the new spreadsheet; we chosen 2
Select OK
The cloned spreadsheet is a temporary file. To ensure we can use it again if we close Partek Genomics Suite, we should save the filtered probe intensities spreadsheet.
Select ()
Name the new file; we chose Down_Syndrome_vs_Normal_Probe_Intensities
Now we have a spreadsheet containing only the probe intensity values for our 23 genes of interest (Figure 4).
Figure 4. Filtered probe intensities spreadsheet
We can now invoke the Violin plot. Make sure to have the filtered probe intensities spreadsheet selected (in blue) in the spreadsheet tree as shown (Figure 4).
Select View from the main taskbar
Select Violin Plot from the menu
A Violin Plot tab will open (Figure 5). This plot shows the intensity value ranges of the 23 genes (probe sets) for all samples as violin plots.
Figure 5. Viewing violin plots for 23 genes
Select View from the main taskbar
Select Toggle Properties
We can now see the plot properties panel to the left of the violin plot (Figure 6).
Figure 6. The violin plot can be configured using the plot properties panel
Although it is called the Violin plot, this visualization can also be used to display box and whisker plots, error bar plots, and gradiant plots. For this example, we will generate box and whisker plots, summarized by Type (Down syndrome and normal), for each gene.
Select Box and Whisker Plot from the Plot type drop-down menu
Select Type from the Summarize by drop-down menu; this can be any categorical variable
Select Hide legend from Legend Options
Select Apply to modify the plot
The modified plot shows box and whisker plots, Down syndrome samples in red and normal in blue, for each gene (Figure 7).
Figure 7. Viewing average probe intensity values for two groups across 23 genes as box and whisker plots
To improve our view of the gene symbols, we can modify the X-axis legend.
Select X-Axis from the tabs in the plot properties panel
Set Text angle to 90 under Labels
Uncheck Trucate labels under Labels
Uncheck Show Outline under Blocks
Uncheck Columns under Attributes
Select Apply (Figure 8)
Figure 8. Configuring the X-axis label
The gene symbol for each column should now be visilble (Figure 9). In cases where probe intensities for your genes of interest fall across a wide range, it may be helpful to normalize the probe intensity distributions of each gene. This is equivalent to what is done to display a heat map of probe intensity values.
Figure 9. X-axis now labels with gene symbols for each gene
Select the Style tab
Select Standardize - shift column to mean of zero and scale to standard deviation of one from the Normalization options
Select Apply
The box and whisker plots are now centered with a mean of zero and scaled to have a standard deviation of one (Figure 10). Similar to a heat map, this makes it easier to visualize which genes are upregulated and which are downregulated. Here, we can see that most of the 23 genes are expressed more highly in Down symdrome patients.
Figure 10. Viewing normalized box and whisker plots
Plots can also be split by categorical variables. We can use this to visualize differential expression of genes between Down syndrom and normal patients in different tissue types.
Select Configure profile
Select Switch to Advanced (Figure 11)
Figure 11. Simple options for configuring profiles in the plot
Select Sub-Plot for Tissue (Figure 12)
Figure 12. Configuring plot properties to split by Tissue
Select OK
Several options will need to be reconfigured before we apply this change.
Select Standardize - shift column to mean of zero and scale to standard deviation of one from the Normalization section
Select the X-axis tab
Set Text Angle to 90
Deselect Truncate labels
Deselect Show outline
Deselect Columns
Select Apply
There should now be a sub-plot for each category, in this case there are four sub-plots, one for each tissue (Figure 13). There are no error bars for several plots because there are not enough samples in those categories.
Figure 13. Splitting a plot by a categorical factor, Tissue, and grouping by another categorical variable, Type
These sub-plots can be displayed all together, or individually.
Select 1 from the Items/Page drop-down menu
You can now move through the sub-plots by selecting Next >.
Select All from the Items/Page drop-down menu to return to the 2x2 view
This data can also be displayed as a gradient plot (Figure 14) or error bar plot (Figure 15) by changing the Plot type using the drop-down menu in the Style tab. By default, the shading range in the gradiant plot and the error bars show +/-1 standard deviation from the mean.
Figure 14. Gradient plot
Figure 15. Error bar plot
The final option, violin plot, cannot be used to display samples grouped by a categorical variable. To view a violin plot, we must remove the Summarize by selection.
Select (One profile per sub-plot) from the Summarize by drop-down menu
Select Violin plot from the Plot type drop-down menu
Select None - do not adjust values for Normalization
Select Apply
The plot now displays violin plots for each gene showing the distribution of probe intensity values for each tissue in a separate sub-plot (Figure 16).
Figure 16. Violin plots for each gene, sub-plots for each tissue
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Genomics Suite tutorials provide step-by-step instructions using a supplied data set to teach you how to use the software’s tools. Upon completion of each tutorial, you will be able to apply your knowledge in your own studies.
Download the data from the Partek site to your local disk. The zip file contains both data and annotation files.
Unzip the files to C:\Partek Training Data\Down_Syndrome-GE or to a directory of your choosing. Be sure to create a directory or folder to hold the contents of the zip file
Copy or move the annotation files (HG-U133A.cdf, HG-U133A.na36.annot, HG-U133A.na36.annot.idx) to C:\Microarray Libraries.
Copying the annotation files to the default library location is done because newer annotation files that are released after the publication of this tutorial may cause the results to be different than what is shown in the published tutorial. If, however, you prefer to download the latest version, you may omit copying the HG-U133A files to C:\Microarray Libraries.
Start Partek® Genomics Suite® and select Gene Expression from the Workflows panel on the right side of the tool bar in the main window (Figure 1)
Figure 1. Selecting the gene expression workflow
Select Import Samples under the Import section of the workflow
Select Import from Affymetrix CEL Files and then select OK
Select the Browse button and select the C:\Partek Training Data\Down_Syndrome-GE folder. By default, all the files with a .CEL extension are selected (Figure 2)
Figure 2. Selecting the folder and CEL files for the experiment
Select the Add File(s) > button to move all the .CEL files to the right panel. Twenty-five CEL files will be processed
Select the Next > button to open the Import Affymetrix CEL Files dialog (Figure 3)
Figure 3. Configuring import files window
Select Customize… to open the Advanced Import Options dialog (Figure 4)
Figure 4. Configuring the Advanced Import Options dialog
Select Library Files… to open the Specify File Locations dialog (Figure 5). This dialog is used to specify the location of the library folder and the annotation files
Figure 5. Specifying Microarray Library files or change the default library directory
Partek Genomics Suite will automatically assign the annotation files according to the chip type stored in the .CEL files. If the annotation files are not available in the library directory, Partek Genomics Suite will automatically download and store them in the Default Library File Folder.
The default library location can be modified by selecting Change... in the Default Library File Folder panel. By default, the library directory is at C:\Microarray Libraries. This directory is used to store all the external libraries and annotation files needed for analysis and visualization. The library directory can also be modified from Tools > File Manager in the main Partek Genomics Suite menu
Select OK (Figure 5) to close the Specify File Locations dialog
Select the Outputs tab from the Advanced Import Options dialog (Figure 6)
Figure 6. Specifying Advanced Import Options to create chip images of and extract the scan date from the CEL files
In the Extract Time Stamp and Date from CEL File panel, make sure the Date button is selected to extract the chip scan date. This information can help you detect if there are batch effects caused by the process time
In the Quality Assess of Gene Expression panel, leave the QC report button unselected. A user guide for the microarray data quality assessment and quality control features is available in the User’s Manual
Select OK to exit the Advanced Import Options dialog
Select Import. The progress bar on the lower left of the Import Affymetrix CEL files dialog will update as .CEL files are imported. Once all files have been imported, the Import Affymetrix CEL Files dialog will close
After importing the .CEL files has finished, the result file will open in Partek Genomics Suite as a spreadsheet named 1 (Down_Syndrome-GE). The spreadsheet should contain 25 rows representing the micoarray chips (samples) and over 22,000 columns representing the probe sets (genes) (Figure 7).
Figure 7. Viewing the main or top-level spreadsheet
For additional information on importing data into Partek Genomics Suite, see Chapter 4 Importing and Exporting Data in the Partek User’s Manual. The User’s Manual is available from the Partek Genomic Suite software main menu Help > User’s Manual. The FAQ (Help > On-line Tutorials > FAQ) may also be helpful. As this tutorial only addresses some topics, you may need to consult the User’s Manual for additional information about other useful features.
It is recommended that you are familiar with Chapter 6 The Pattern Visualization System of the User’s Manual before going through the next section of the tutorial.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This tutorial will illustrate:
Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.
Down syndrome is caused by an extra copy of all or part of chromosome 21; it is the most common non-lethal trisomy in humans. At the time of the study used in this tutorial, conflicting reports had thrown into doubt whether individuals with Down syndrome have dysregulation of gene expression throughout the genome or primarily in genes from chromosome 21. To address this question, Affymetrix GeneChip™ Human U133A arrays were used to assay 25 samples taken from 10 human subjects, with or without Down syndrome, and 4 different tissues. The data revealed a significant upregulation of chromosome 21 genes at the gene expression level in individuals with Down syndrome; this dysregulation was largely specific to chromosome 21 and not a genome-wide phenomenon.
The raw data is available as experiment number GSE1397 in the .
Data and associated files for this tutorial can be downloaded using this link - (right-click the link and choose "Save Link As" to download the tutorial data).
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Twenty-five CEL files (samples) have been imported into Partek Genomics Suite. Sample information must be added to define the grouping and the goals of the experiment.
Select Add Sample Attributes in the Import section of the Gene Expression workflow panel
Choose the option Add Attributes from an Existing Column
Select OK to open the Sample Information Creation dialog
In this tutorial, the file name (e.g., Down Syndrome-Astrocyte-748-Male-1-U133A.CEL) contains the information about a sample and is separated by hyphens (-). Choosing to split the file name by delimiters will separate the categories into different columns
In the Sample Information panel, specify the column labels (Labels 1-4) as Type, Tissue, Subject, and Gender, set each as categorical, and set the other columns as skip (Figure 1). Select OK
Figure 1. Configuring the Sample Information Creation dialog
A dialog window asking if you would like to save the spreadsheet with the new sample attribute will appear. Select Yes
Make column 5. (Subject) random by right-clicking on the column header and selecting Properties from the pop-up menu (Figure 2).
Figure 2. Changing column properties
Select the Random Effect check box from the Properties dialog (Figure 3) then select OK.
Figure 3. Setting column to Random Effect
The column 5. (Subject) will now be colored red, indicating that it is a random effect.
At this point in analysis, you should explore the data preliminarily. Do the genes you expected to be differentially regulated appear to have larger or smaller intensity values? Do similar samples resemble each other?
The latter question can be explored using Principal Components Analysis (PCA), an excellent method for reducing and visualizing high-dimensional data.
Select PCA Scatter Plot from the QA/AC section of the Gene Expression workflow
A Scatter Plot tab containing your PCA plot will open (Figure 1).
Figure 1. PCA Scatter Plot tab
In the scatter plot, each point represents a chip (sample) and corresponds to a row on the top-level spreadsheet. The color of the dot represents the Type of the sample; red represents a normal sample and blue represents a Down syndrome sample. Points that are close together in the plot have similar intensity values across the probe sets on the whole chip, while points that are far apart in the plot are dissimilar
Left-clicking on any point in the scatter plot selects that point. A dash with an identifying row number will appear on the selected PCA plot point. The spreadsheet in the Analysis tab will also jump to the corresponding row.
As you can see from rotating the plot, there is no clear separation between Down syndrome and normal samples in this data since the red and blue samples are not separated in space. However, there are other factors that may separate the data.
Color the points by column 4. Tissue and Size the points by column 3. Type
Select OK
Figure 2. Configuring the PCA scatter plot: Color by Tissue, size by Type
Notice now that the data are clustered by different tissues (Figure 3).
Figure 3. PCA scatter plot configured with color by Tissue, size by Type
Another way to see the cluster pattern is to put an ellipse around the Tissue groups.
Open the Plot Rendering Properties dialog and select the Ellipsoids tab
Select Add Ellipse/Ellipsoid
Select Ellipse in the Add Ellipse/Ellipsoid... dialog
Double click on Tissue in the Categorical Variable(s) panel to move it to the Grouping Variable(s) panel (Figure 4)
Select OK to close the Add Ellipse/Ellipsoid... dialog and select OK again to exit the Plot Rendering Properties dialog
Figure 4. Adding Ellipses to PCA Scatter Plot
By rotating this PCA plot, you can see that the data is separated by tissues, and within some of the tissues, the Down syndrome samples and normal samples are separated. For example, in the Astrocyte and Heart tissues, the Down syndrome samples (small dots) are on the left, and the normal samples (large dots) are on the right (Figure 5).
Figure 5. PCA scatter plot with ellipses, rotated to show separation by Type
PCA is an example of exploratory data analysis and is useful for identifying outliers and major effects in the data. From the scatter plot, you can see that the tissue is the biggest source of variation. There are many genes that express differently between the tissues, but not as many genes that express differently between type (Down syndrome and normal) across the whole chip.
The next step is to draw a histogram to examine the samples.
Select Sample Histogram in the QA/QC section of the Gene Expression workflow to generate the Histogram tab (Figure 6)
Figure 6. Histogram tab
The histogram plots one line for each of the samples with the intensity of the probes graphed on the X-axis and the frequency of the probe intensity on the Y-axis. This allows you to view the distribution of the intensities to identify any outliers. In this dataset, all the samples follow the same distribution pattern indicating that there are no obvious outliers in the data. As demonstrated with the PCA plot, if you click on any of the lines in the histogram, the corresponding row will be highlighted in the spreadsheet 1 (Down_Syndrome-GE). You can also change the way the histogram displays the data by clicking on the Plot Properties button. Feel free to explore these options on your own.
The decision to discard any samples would be based on information from the PCA plot, sample histogram plot, and QC metrics. To discard a sample and renormalize the data (without the effects of the outlier), start over with importing samples and omit the outlier sample(s) during the .CEL file import.
Analysis of variance (ANOVA) is a very powerful technique for identifying differentially expressed genes in a multi-factor experiment such as this one. In this data set, ANOVA will be used to generate a list of genes that are significantly different between Down syndrome and normal samples with an absolute difference bigger than 1.3 fold.
The ANOVA model should include Type because it is the primary factor of interest. From the exploratory analysis using the PCA plot, we observed that tissue is a large source of variation; therefore, Tissue should be included in the model. In the experiment, multiple samples were taken from the same subject, so Subject must be included in the model. If Subject were excluded from the model, the ANOVA assumption that samples within groups are independent will be violated. Additionally, the PCA scatter plot showed that the Downs syndrome and normal samples separated within tissue type, so the Type*Tissue interaction should be included in the model.
To invoke the ANOVA dialog, select Detect Differentially Expressed Genes in the Analysis section of the Gene Expression workflow
In the Experimental Factor(s) panel, select Type, Tissue and Subject by pressing and left clicking each factor
Use the Add Factor > button to move the selections to the ANOVA Factor(s) panel
Select both Type and Tissue by holding on your keyboard and left clicking each factor
Select the Add Interaction > button to add a Type * Tissue interaction to the ANOVA Factor(s) panel (Figure 1)
Do NOT select OK or Apply. We will be adding contrasts to this ANOVA model in an upcoming section of the tutorial.
Figure 1. Configuring ANOVA factors and interactions
Most factors in ANOVA are fixed effects, whose levels in a data set represent all the levels of interest. In this study, Type and Tissue are fixed effects. If the levels of a factor in a data set only represent a random sample of all the levels of interest (for example, Subject), the factor is a random effect. The ten subjects in this study represent only a random sample of the global population about which inferences are being made. Random effects are colored red on the spreadsheet and in the ANOVA dialog. When the ANOVA model includes both random and fixed factors, it is a mixed-model ANOVA.
Another way to determine if a factor is random or fixed is to imagine repeating the experiment. Would the same levels of each factor be used again?
Type – Yes, the same types would be used again - a fixed effect
Tissue – Yes, the same tissues would be used again - a fixed effect
Subject - No, the samples would be taken from other subjects- a random effect
You can specify which factors are random and which are fixed when you import your data or after importing by right-clicking on the column corresponding to a categorical variable, selecting Properties, and checking Random Effect. By doing that, the ANOVA will automatically know which factors to treat as random and which factors to treat as fixed.
The subject factor in the ANOVA model is listed as “5. Subject (3. Type)”, which means that Subject is nested in Type. Partek Genomics Suite can automatically detect this sort of hierarchical design and will adjust the ANOVA calculation accordingly.
By default, an ANOVA only outputs a p-value for each factor/interaction. To get the fold change and ratio between Down syndrome and normal samples, a contrast must be set up.
Select Contrasts… to invoke the Configure dialog
Choose 3**.**Type from the Select Factor/Interaction drop-down list. The levels in this factor are listed on the Candidate Level(s) panel on the left side of the dialog
Left click to select Down Syndrome from the Candidate Level(s) panel and move it to the Group 1 panel (renamed Down Syndrome) by selecting Add Contrast Level > in the top half of the dialog.
Label 1 will be changed to the subgroup name automatically, but you can also manually specify the label name.
Select Normal from the Candidate Level(s) panel and move it to the Group 2 panel (renamed Normal)
The Add Contrast button can now be selected (Figure 2)
Figure 2. Adding a contrast of Down Syndrome and Normal samples
Because the data is log2 transformed, Partek Genomics Suite will automatically detect this and will automatically select Yes for Data is already log transformed? in the top right-hand corner of the dialog. Partek Genomics Suite will use the geometric mean of the samples in each group to calculate the fold change and mean ratio for the contrast between the Down syndrome and normal samples.
Select Add Contrast to add the Down Syndrome vs. Normal contrast
Select OK to apply the configuration
If successfully added, the Contrasts… button will now read Contrasts Included (Figure 3)
Figure 3. ANOVA configuration with contrasts included
By default, Specify Output File is checked and gives a name to the output file. If you are trying to determine which factors should be included in the model and you do not wish to save the output file, simply uncheck this box
Select OK in the ANOVA dialog to compute the 3-way mixed-model ANOVA
Several progress messages will display in the lower left-hand side of the ANOVA dialog while the results are being calculated.
The result will be displayed in a child spreadsheet, ANOVA-3way (ANOVAResults). In this spreadsheet, each row represents a probe set and the columns represent the computation results for that probe set (Figure 4). Although not synonymous, probe set and gene will be treated as synonyms in this tutorial for convenience. By default, the genes are sorted in ascending order by the p-value of the first categorical factor. In this tutorial,Type is the first categorical factor, which means the most highly significant differently expressed gene between Down syndrome and normal samples is at the top of the spreadsheet in row 1.
Figure 4. ANOVA spreadsheet
For additional information about ANOVA in Partek Genomics Suite, see Chapter 11 Inferential Statistics in the User’s Manual (Help > User’s Manual).
Deciding which factors to include in the ANOVA may be an iterative process while you decide which factors and interactions are relevant as not all factors have to be included in the model. For example, in this tutorial, Gender and Scan date were not included. The Sources of Variation plot is a way to quantify the relative contribution of each factor in the model towards explaining the variability of the data.
Select View Sources of Variation from the Analysis section of the Gene Expression workflow with the ANOVA result spreadsheet active
A Sources of Variation tab will appear (Figure 5) with a bar chart showing the signal to noise ratio for each factor accross the whole genome. Sources of variation can also be viewed as a pie chart showing sum or squares by selecting the Pie Chart (Sum of Squares) tab in the upper left-hand side of the Sources of Variation tab.
Figure 5. Sources of Variation tab showing a bar chart
This plot presents the mean signal-to-noise ratio of all the genes on the microarray. All the non-random factors in the ANOVA model are listed on the X-axis (including error). The Y-axis represents the mean of the ratios of mean square of all the genes to the mean square error of all the genes. Mean square is ANOVA’s measure of variance. Compare the bar for each signal to the bar for error; if a factor's bar is higher than error's bar, that factor contributed significant variation to the data across all the variables. Notice that this plot is very consistent with the results in the PCA scatter plot. In this data, on average, Tissue is the largest source of variation.
To view the source of variation for each individual gene, right click on a row header in the ANOVA-3way (ANOVAResults) spreadsheet and select Sources of Variation from the pop-up menu. This generates a Sources of Variation tab for the individual gene. View a few Sources of Variation plots from rows at the top of the ANOVA table and a few from the bottom of the table.
Another useful graph is an ANOVA Interaction Plot.
Right-click on a row header in the ANOVA spreadsheet (Figure 6)
Select ANOVA Interaction Plot to generate an Interaction Plot tab for that individual gene
Figure 6. Calling an ANOVA Interaction Plot for a gene
Generate these plots for rows 3 (DSCR3) and 8 (CSTB). If the lines in the interaction plot are not parallel, then there is a chance that there is an interaction between Tissue and Type. Error bars show standard error of the least squared mean. DSCR3 is a good example of this (Figure 7). We can look at the p-values in column 9, p-value(Type * Tissue) to check if this apparent interaction is statistically significant.
Figure 7. Interaction Plot for DSCR3
We can view the expression levels of a gene for each sample using a dot plot.
Right click on the gene row header and select Dot Plot (Orig. Data) from the pop-up menu. This generates a Dot Plot tab for the selected gene (Figure 8)
Figure 8. Dot Plot showing DSCR3 expression levels for each sample
In the plot, each dot is a sample of the original data. The Y-axis represents the log2 normalized intensity of the gene and the X-axis represents the different types of samples. The median expression of each group is different from each other in this example. The median of the Down syndrome samples is ~6.3, but the median of the normal samples is ~6.0. The line inside the Box & Whiskers represents the median of the samples in a group. Placing the mouse cursor over a Box & Whiskers plot will show its median and range.
Now that you have obtained statistical results from the microarray experiment, you can create new spreadsheets containing just those genes that pass certain criteria. This will streamline data management by focusing on just those genes with the most significant differential expression or substantial fold change. The List Manager can be used to specify numerous conditions for selecting genes of interest. In this tutorial, we are going to create a gene list of gene with a fold change between -1.3 to 1.3 that has an unadjusted p-value of < 0.0005.
Invoke the List Manager dialog by selecting Create Gene List in the Analysis section of the Gene Expression workflow
Ensure that the 1/ANOVA-3way (ANOVAResults) spreadsheet is selected as this is the spreadsheet we will be using to create our new gene list as shown (Figure 1)
Select the ANOVA Streamlined tab.
Set Contrast: find genes that change between two categories panel, to Down Syndrome vs. Normal and select Have Any Change from the Setting drop-down menu
This will find genes with different expression levels in the different types of samples.
In the Configuration for “Down Syndrome vs Normal” panel, check that Include size of the change is selected and enter 1.3 into Change > and -1.3 in OR Change <
Select Include significance of the change, choose unadjusted p-value from the dropdown menu, and < 0.001 for the cutoff
The number of genes that pass your cutoff criteria will be shown next to the # Pass field. In this example, 30 genes pass the criteria.
Set Save the list as A
Select Create to generate the new list A
Select Close to view the new gene list spreadsheet
Figure 1. Creating a gene list from ANOVA results
The spreadsheet Down_Syndrome_vs_Normal (A) will be created as a child spreadsheet under the Down_Syndrome-GE spreadsheet.
This gene list spreadsheet can now be used for further analysis such as hierarchical clustering, gene ontology, integration of copy number data, or be exported into other data analysis tools such as pathway analysis.
Next, we will generate a list of genes that passed a p-value threshold of 0.05 and fold-changes greater than 1.3 using a volcano plot.
Select the 1/ANOVA-3way (ANOVAResults) spreadsheet in the Analysis tab. This is the spreadsheet our gene list will be drawn from
Select View > Volcano Plot from the Partek Genomics Suite main menu (Figure 2)
Figure 2. Generating a Volcano Plot from ANOVA results
Set X Axis (Fold-Change) to 12. Fold-Change(Down Syndrome vs. Normal), and the Y axis (p-value) to be 10. p-value(Down Syndrome vs. Normal)
Select OK to generate a Volcano Plot tab for genes in the ANOVA spreadsheet (Figure 3)
Figure 3. Volcano plot generated from ANOVA spreadsheet
In the plot, each dot represents a gene. The X-axis represents the fold change of the contrast (Down syndrome vs. Normal), and the Y-axis represents the range of p-values. The genes with increased expression in Down syndrome samples are on the right side of the N/C (no change) line; genes with reduced expression in Down syndrome samples are on the left. The genes become more statistically significant with increasing Y-axis position. The genes that have larger and more significant changes between the Down syndrome and normal groups are on the upper right and upper left corner.
In order to select the genes by fold-change and p-value, we will draw a horizontal line to represent the p-value 0.05 and two vertical lines indicating the –1.3 and 1.3-fold changes (cutoff lines).
Choose the Axes tab
Check Select all points in a section to allow Partek Genomics Suite to automatically select all the points in any given section
Select the Set Cutoff Lines button and configure the Set Cutoff Lines dialog as shown (Figure 4)
Figure 4. Setting cutoff lines for -1.3 to 1.3 fold changes and a p-value of 0.05
Select OK to draw the cutoff lines
Select OK in the Plot Rendering Properties dialog to close the dialog and view the plot
The plot will be divided into six sections. By clicking on the upper-right section, all genes in that section will be selected.
Right-click on the selected region in the plot and choose Create List to create a list including the genes from the section selected (Figure 5). Note that these p-values are uncorrected
Figure 5. Creating a gene list from a volcano plot
Note: If no column is selected in the parent (ANOVA) spreadsheet, all of the columns will be included in the gene list; if some columns are selected, only the selected columns will be included in the list.
Specify a name for the gene list (example: volcano plot list) and write a brief description about the list.
The description is shown when you right-click on the spreadsheet > Info > Comments. Here, I have named the list "volcano plot list" and described it as "Genes with >1.3 fold change and p-value <0.05" (Figure 6). The list can be saved as a text file (File > Save As Text File) for use in reports or by downstream analysis software.
Figure 6. Saving a list created from a volcano plot
To save changes to the spreadsheet, select the Save Active Spreadsheet icon (). Spreadsheets with unsaved changes have an asterisk next to their name in the spreadsheet tree.
Note: More details on Random vs. Fixed Effects can be found later in this tutorial under the section .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
While pressing the mouse wheel down, drag the mouse to rotate the plot or select the Rotate Mode icon () on the left side of the Scatter Plot tab. With Rotate Mode selected, press the left mouse button and drag to rotate the plot. Rotating the plot allows you to examine the grouping pattern or outliers of the data on the first three principal components (PCs).
Scrolling the mouse wheel up or down while the cursor is on the PCA plot will zoom in and out or select the Zoom Mode icon () on the left side of the Scatter Plot tab.
Selecting the Reset icon () option on the left side of the Scatter Plot tab will return the PCA plot to its original orientation and zoom.
In the Scatter Plot tab, select the Rendering Properties icon () and configure the plot as shown (Figure 2)
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
You can practice creating new gene list criteria of your own to become familiar with the List Manager tool. For more information, you can always click on the () buttons.
Select Rendering Properties ()
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The original experiment is listed on the Gene Expression Omnibus as GSE848; however, this tutorial only uses a subset of the original experiment and should be downloaded from the Partek website tutorial page, Gene Expression Analysis with Batch Effects.
Download the zipped project folder, Breast_Cancer-GE.zip
Unzip the project folder to C:/Partek Training Data/ or a directory of your choosing
This location should be easily accessible. The unzipped Breast_Cancer-GE project folder and a zipped annotation file will be added to the selected directory.
Unzip the included annotation file, HG_U95Av2.na32.annot.rar
Move the annotation file, HG_U95Av2.na32.annot, to the microarray libraries folder
By default, the microarray libraries folder will be located at C:/Microarray Libraries, but the location may vary depending on your operating system and configuration.
Open Partek Genomics Suite
Select () from the main command bar
Navigate to the tutorial folder, Breast_Cancer-GE
Select Breast_Cancer.txt
Select Open (Figure 1)
Figure 1. Opening a data file. The red Partek Genomics Suite icon is shown next to the data file (FMT file format)
The spreadsheet will open as 1 (Breast_Cancer.txt) (Figure 2).
Figure 2. Breast_Cancer.txt data file
The summary at the bottom the spreadsheet shows there are 18 rows and 12,631 columns in the spreadsheet. The first column contains the Filename listing the GEO GSM number. This is also is an identifier for the microarray. Treatment, Time, and Batch are in columns 2, 3, and 4, respectively. Column 6 marks the beginning of the probesets. The data is log2 transformed.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
During data importation, the GeneChip annotation file was linked to the imported data. This linked annotation information can be added as new columns to the ANOVA or gene list spreadsheets. For example, we can add additional annotation to the gene list we created from the ANOVA results as follows:
In the Down_Syndrome_vs_Normal (A) spreadsheet, right click on the second column header 2. ProbesetID and select Insert Annotation from the pop-up menu (Figure 3)
Figure 1. Inserting an annotation
Select Chromosomal Location under the Column Configuration panel (Figure 4). Leave everything else as default
Select OK
Figure 2. Adding Chromosomal Location annotation
Interestingly, of the 23 genes of the Down_Syndrome_vs_Normal (A) spreadsheet, 20 genes are located on chromosome 21. This suggests that the gene expression changes associated with Down syndrome observed in this study are primarily located on chromosome 21, not distributed throughout the genome, an important finding of this study.
To save changes to the spreadsheet, select the Save Active Spreadsheet icon ().
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The gene list in spreadsheet Down_Syndrome_vs_Normal (A) can be used for hierarchical clustering to visualize patterns in the data.
Under the Visualization section in the Gene Expression workflow, select Cluster Based on Significant Genes
The Cluster Significant Genes dialog asks you to specify the type of clustering you want to perform.
Choose Hierarchical Clustering and select OK
Choose the Down_Syndrome_vs_Normal (A) spreadsheet under the Spreadsheet with differentially expressed genes
Choose the Standardize – shift genes to mean of zero and scale to standard deviation of one under the Expression normalization panel (Figure 1)
This option will adjust all the gene intensities such that the mean is zero and the standard deviation is 1.
Figure 1. Configuring Hierarchical Clustering
Select OK to generate a Hierarchical Clustering tab (Figure 2)
Figure 2. Hierarchical Clustering of Down_Syndrome_vs_Normal (A)
The graph (Figure 2) illustrates the standardized gene expression level of each gene in each sample. Each gene is represented in one column, and each sample is represented in one row. Genes with no difference in expression have a value of zero and are colored black. Genes with increased expression in Down syndrome samples have positive values and are colored red. Genes with reduced expression in Down syndrome samples have negative values and are colored green. Down syndrome samples are colored red and normal samples are colored orange. On the left-hand side of the graph, we can see that the Down syndrome samples cluster together.
For more information on the methods used for clustering, you can refer to Chapter 8: Hierarchical & Partitioning Clustering in Help > User’s Manual. For a tutorial on configuring the clustering plot, please refer to Hierarchical Clustering Analysis
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This tutorial will will illustrate:
Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on Our support page to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.
The data for this tutorial is taken from an experiment that examined the effects of four treatment conditions at two time points on estrogen receptor-positive breast cancer cell lines in vitro. Each treatment/time combination has two replicates and there are two control samples for a total of eighteen samples. Gene expression analysis was performed using the Affymetrix GeneChip_®_ Human U95A array. Values are transformed to log base 2 scale by f(x) = log2(x+1).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Gene lists can be visualized and their ability to distinguish samples evaluated using a hierarchical clustering heat map. Because of the batch effect in this data set, we will perform hierarchical clustering using batch-corrected intensity values. To do this, we need to open the fourtreatments list of differentially expressed genes as a child spreadsheet of the batch-remove spreadsheet
Select fourtreatments from the spreadsheet tree
Select () to close the spreadsheet
Select 1-removeresult (batch-remove) from the spreadsheet tree
Select File from the main tool bar
Select Open as child...
Select fourtreatments using the file browser
The fourtreatments spreadsheet will open as a child spreadsheet of batch-remove (Figure 1).
Figure 1. The fourtreatments spreadsheet is open as a child spreadsheet of bath-remove. Visualizations performed using fourtreatments will pull intensity values from batch-remove.
Visualizations performed using the fourtreatments spreadsheet will now use intensity values from the batch-remove spreadsheet.
To invoke hierarchical clustering, follow the steps below.
Select Cluster Based on Significant Genes from the Visualization section of the Gene Expression workflow
Select Hierarchical Clustering
Select OK
Select 1-removeresult/1 (fourtreatments) from the drop-down menu
Select Standardize for Expression normalization (Figure 2)
Figure 2. Configuring the Cluster the significant genes dialog
Select OK
The hiearchical clustering heat map will open in a new tab (Figure 3).
Figure 3. Hierarchical clustering of genes with significantly different expression across the treatment groups
For detailed information about the methods used for clustering, refer to the Partek Manual Chapter 8: Hierarchical & Partitioning Clustering.
The List Manager can be used to generate lists of genes by applying criteria such as fold change and false discovery rate (FDR) adjusted p-value thresholds.
Select the Analysis tab
Select ANOVAResults in the spreadsheet tree
Select Create Gene List from the Analysis section of the Gene Expression workflow (Figure 1)
Figure 1. Selecting Create Gene List from the Gene Expression workflow
Select E2 vs. Control from the Contrast panel of the ANOVA Streamlined tab in the List Manager dialog
Deselect the Include size of the change option
Set p-value with FDR < to 0.1 (Figure 2)
Figure 2. Configuring the List Manager using the ANOVA Streamlined filtering options
There should be ~545 probe(sets)/genes that meet this threshold.
Select Create
A new spreadsheet, E2 vs. Control, will be added as a child spreadsheet of Breast_Cancer.txt.
Repeat the steps listed above to create lists for E2+ICI vs. Control (~24 genes), E2+Ral vs. Control (~22 genes), and E2+TOT vs. Control (~177 genes) with the same threashold
Now we can use the Venn Diagram to create a list of genes that are differentially regulated in all treatment groups.
Select the Venn Diagram tab in the List Manager dialog
The Venn Diagram shows overlap between selected gene lists.
Select the four created lists (E-H) in the spreadsheet list in the List Manager dialog by selecting each while holding the Ctrl key on your keyboard
The Venn Diagram will display the number of overlapping and distinct genes from the four lists (Figure 3).
Figure 3. Viewing the Venn Diagram with intersections of four lists of significant genes
The intersection of the four ellipses shows that 14 differentially regulated genes are in common between the four threatment schemes.
Select the region intersecting all four ellipses
Right-click the intersected region
Select Create List From Highlighted Regions
Select Close to exit the List Manager dialog
The new list will appear in the spreadsheet tree with a temporary file name (ptpm).
Select the temporary list in the spreadsheet tree
Save the list as fourtreatments
While many types of data sets are automatically linked with appropriate annotation files upon import, if this does not occur, a spreadsheet can be manually linked with an annotation file.
Right-click Breast_Cancer.txt in the spreadsheet tree
Select Properties (Figure 1)
Figure 1. Selecting file properties for a spreadsheet
Configure the Configure Genomic Properties as shown (Figure 2) with the following steps:
Select Gene Expression from the Choose the type of genomic data drop-down menu
Select Feature in column label
Select Browse...
Select HG_U95Av2.na36.annot.csv from the microarray libraries folder
Select Set Column
Select Gene Symbol from the Choose column containing gene symbol/microRNA name dialog
Select Homo sapiens and hg19 from the Species and Genome Build drop-down menus
Figure 2. Configure the genomic properties dialog as shown
There is now an * after the spreadsheet name in the spreadsheet tree. This indicates an unsaved change has been made to the spreadsheet.
Principal Components Analysis (PCA) is an excellent method to visualize similarities and differences between the samples in a data set. PCA can be invoked through a workflow, by selecting () from the main command bar, or by selecting Scatter Plot from the View section of the main toolbar. We will use a workflow.
Select Gene Expression from the Workflows drop-down menu
Select PCA Scatter Plot from the QA/QC section of the Gene Expression workflow
The PCA scatter plot will open as a new tab (Figure 1).
Figure 1. Viewing the PCA scatter plot. Each point is a sample. Samples are colored by treatment.
In this PCA scatter plot, each point represents a sample in the spreadsheet. Points that are close together in the plot are more similar, while points that are far apart in the plot are more dissimilar.
To better view the data, we can rotate the plot.
Click and drag to rotate the plot
Rotating the plot allows us to look for outliers in the data on each of the three principal components (PC1-3). The percentage of the total variation explained by each PC is listed by its axis label. The chart label shows the sum percentage of the total variation explained by the displayed PCs.
We can change the plot properties to better visualize the effects of different variables.
Set Shape to 4. Batch
Set Size to 3. Time
Set Connect to 5. Treatment Combination
Select OK (Figure 2)
Figure 2. Configuring plot properties to color by treatment, shape by batch, size by time, and connect by treatment combination
The PCA scatter plot now shows information about treament, batch, and time for each sample (Figure 3).
Figure 3. PCA scatter plot showing treatment, batch, and time information for each sample. A batch effect is clearly visible.
PCA is particularly useful for identifying outliers and batch effects in data sets. We can see a batch effect in this dataset as samples separate by batch. To make this more clear, we can add an ellipses by Batch.
Select Ellipsoids from the tab
Select Add Ellipse/Ellipsoid
Select Ellipse
Select Batch from the Categorical Vairable(s) panel and move it to the Group Variable(s) panel
Select OK
Select OK to close the dialog
The ellipses help illustrate that the data is spearated by batches (Figure 4).
Figure 4. Ellipses around batch groups show that samples separate by batch
Ways to address the batch effect in the data set will be detailed later in this tutorial.
Illumina’s MethylationEPIC array interrogates the methylation status of over 850,000 cytosines in the human genome. Because the MethylationEPIC array is closely related to the Infinium HumanMethylation450 BeadChip, the steps presented in this document can be applied to either platform.
This tutorial illustrates how to:
Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.
The data set accompanying this document consists of sixteen human samples processed by Illumina MethylationEPIC arrays. The data set is taken from a study of DNA methylation in human B cells and B cells infected with Epstein-Barr virus (EBV).
Infecting B cells with EBV in vitro transforms them, making them capable of indefinite growth in vitro. These immortalized cell lines are referred to as lymphoblastoid cell lines (LCLs). LCLs behave similarly to activated B cells, making them useful for expanding T cells in vitro. Because EBV is a carcinogen and immortalized cell growth is a hallmark of cancer, examining the effects of EBV transformation on B cell DNA methylation might shed light on the roles of DNA methylation in tumor development.
By including Batch in the ANOVA model, the variability due to the batch effect is accounted for when calculating p-values for the non-random factors. In this sense, the batch effect has already been removed. However, visualizing biological effects can be very difficult if batch effects are present in the original intensity data used to generate visualizations. We can modify the original intensity data to remove the batch effect using the Remove Batch Effect tool.
The Remove Batch Effect tool functions much like ANOVA in reverse, calculating the variation attributed to the factor being removed then adjusting the original intensity values to remove the variation. Once the variation caused by the batch effect has been removed, tools like PCA or clustering can be used to visualize what the data would look like if the batch effect was not present.
Select the1 (Breast_Cancer.txt) spreadsheet
Select Stat from the main tool bar
Select Remove Batch Effect... (Figure 1)
Figure 1. Invoking the Remove Batch Effect tool
The Remove Batch Effects dialog will open. The tool functions by performing an ANOVA then modifying the original intensities values to remove the effects of the specified factor(s).
Select Treatment, Time, and Batch
Select Add Factor > to add them to the ANOVA Factor(s) panel
Select Batch in the ANOVA Factor(s) panel
Select Add Factor > to add Batch to the Remove Effect(s) of These Factor(s) panel
By default, the results will be displayed in a new spreadsheet. Options to overwrite the current spreadsheet and specify the output file appear in the bottom of the dialog (Figure 2).
Figure 2. Configuring the Remove Batch Effects tool to remove Batch and create a new spreadsheet
Select OK
The new spreadsheet, 1-removeresult (batch-remove) will open in the Analysis tab (Figure 3).
Figure 3. Viewing the new spreadsheet with batch effects removed
We can visualize the effects of removing the batch effects using PCA.
Select 1 (Breast_Cancer.txt) from the spreadsheet tree
Set Drawing Mode to Mixed
Select the Ellipsoids tab
Select Add Centroid
Add Batch to the Grouping Variable(s) panel
Set the colors of the two centroids as shown (Figure 4) to pink and yellow
Figure 4. Adding a centroid for Batch
Select OK to close the Add Centroid...
Select OK to close the Configure Plot Properties dialog
The two centroids are distinct, showing the batch effect (Figure 5).
Figure 5. Viewing a batch effect using PCA. The batches are shown as the pink (A) and yellow (B) centroids. The clear separation of the centroids indicates a batch effect
Repeat the above steps for 1-removeresult (batch-remove)
For 1-removeresult (batch-remove), the centroids of the two batches overlap, showing that the batch effect has been removed (Figure 6).
Figure 6. Overlapping centroids for batches A and B show that the batch effect has been removed.
Visualization of ANOVA results for single probe(sets)/genes also benefits from batch removal. To illustrate this, we first need to repeat our ANOVA using the new batch-remove intesitiy values spreadsheet.
Select the Analysis tab
Select 1-removeresult (batch-remove) in the spreadsheet tree
Select Stat from the main toolbar
Select ANOVA...
Add Treatment, Time, and Batch factors to the ANOVA Factor(s) panel
Add Treatment * Time interaction to the ANOVA Factor(s) panel
Select Contrasts...
Select Treatment from the Select Factor/Interaction drop-down menu
Select Yes for Data is already log transformed?
Set up contrasts of treatment vs. control for E2, E2+ICI, E2+Ral, and E2+TOT (Figure 7)
Figure 7. Configuring ANOVA to comparing treatment groups to control
Select OK to add contrasts
Change output file name to ANOVAResults_batch-remove
Select OK to perform the ANOVA
The ANOVAResults_batch-remove spreadsheet will open in the Analysis tab.
Select the ANOVAResults spreadsheet
Right-click on the row header for row 2, TFF1
Select Dot Plot (Orig. Data) (Figure 8)
Figure 8. Invoking a dot plot from the ANOVAResults spreadsheet
A dot plot for trefoil factor 1 (TFF1) will open (Figure 9). The dot plot shows gene intensity values (y-axis) for each sample. Samples are grouped by Treatment.
Figure 9. Viewing the dot plot for trefoil factor 1 (TIFF1) across different treatment groups
To visualize the batch effect we will make a few changes to the plot.
Select H/V to switch the horizontal and vertical axis
Set Color to Batch
Set Size to Time
Set Connect to Treatment Combination (Figure 10)
Figure 10. Configuring the dot plot (part 1 of 2)
Select the Labels tab
Select Column for In Point Labels
Select Time from the Column drop-down list (Figure 11)
Figure 11. Configuring the dot plot (part 2 of 2)
Select OK
The dot plot now clearly shows the batch effect (Figure 12). Samples within treatment groups are separated clearly between the two batches shown in blue and red.
Figure 12. Viewing a dot plot showing a batch effect. Each dot is a sample. The y-axis is treatment combinations; the x-axis is the expression value of the TFF1 gene. Dots are colored by batch, sized by time, connected by treatment combination, and labeled by time.
To view the effects of batch removal, we can view this dot plot for the ANOVAResults_batch-remove spreadsheet.
Select the Analysis tab
Select ANOVA-3way (ANOVAResults_batch-remove) from the spreadsheet tree
Repeat the steps shown above to create the dot plot for trefoil factor 1
The dot plot invoked from the ANOVAResults_batch-remove) spreadsheet shows that the batch effect has been removed as all the samples no longer clearly separate by color within treatment groups (Figure 13).
Figure 13. Viewing the dot plot that shows batch effect removal. The plot configuration matches Figure 12.
Analysis of variance (ANOVA) is a very powerful technique for identifying differentially expressed genes in a multi-factor experiment. In this data set, ANOVA will be used to generate a list of genes that are significantly differentially regulated by each treatment.
When setting up the ANOVA, the primary factors of interest, Treatment and Time, should be included. We will also include the interaction between Treatment and Time, Treatment * Time, because we are interested in whether different treatments behave differently over time. From our exploratory analysis using PCA, we also know that Batch is a major source of variation and needs to be included. Including Batch as a random factor will allow us to account for the batch effect.
Select Detect differentially expressed genes from the Analysis section of the Gene Expression workflow
Select Treatment, Time, and Batch in the Experimental Factor(s) panel
Select Add Factor > to move the selections to the ANOVA Factor(s) panel
Select both Treatment and Time in the Experimental Factor(s) panel by holding Ctrl on the keyboard while selecting each
Select Add Interaction > to add the Treatment * Time interaction to the ANOVA Factor(s) panel (Figure 1)
Do not select OK or Apply. We still need to add linear contrasts to the ANOVA model
Figure 1. Adding factors and interactions to the ANOVA
ANOVA will output a p-value and F ratio for each factor or interaction; to get the fold-change and ratio between the different levels of a factor or interaction, linear contrasts, or comparisons, must be added.
Select Contrasts... in the ANOVA dialog (Figure 1)
Select Yes for Data is already log transformed?
Select Treatment * Time from the Select Factor/Interaction drop-down menu
We will add contrasts comparing each of the three treatment groups to the control group.
Select E2 * 8 and E2 * 48 from the Candidate Level(s) panel
Select Add Contrast Level > to move them to the top panel (Group 1) on the right-hand side
The Group 1 panel will be renamed after the contents of the panel. We can specify a name for the group.
Set Label of the top panel to E2
Select Control * 0 from the Candidate Level(s) panel
Select Add Contrast Level > to move it to the bottom panel (Group 2) on the right-hand side
Set Label of the bottom panel to Control
The lower panel (Group 2) is considered the reference level. Because the data is log2 transformed, the geometric mean will be used to calculate the fold change and mean ratio to place both on a linear scale instead of a log scale.
Select **Add Contrast (**Figure 2)
Figure 2. Adding a contrast between E2 vs. Control at all time points.
To examine the time points of each treatment condition separately, we can select Add Combinations instead of Add Contrast. This adds every possible contrast for the levels in the Group 1 and Group 2 panels.
Select E2 * 8 and E2 * 48 from the Candidate Level(s) panel
Select Add Contrast Level > to move them to the top panel (Group 1) on the right-hand side
Select Control * 0 from the Candidate Level(s) panel
Select Add Contrast Level > to move it to the bottom panel (Group 2) on the right-hand side
Select Add Combinations to add contrasts for E2 * 8 vs. Control * 0 and E2 * 48 vs. Control * 0 (Figure 3)
Figure 3. Add Combinations creates contrasts for every combination of levels from the two group panels.
For this tutorial, we will not be considering the time points of each treatment condition individually. We can remove the E2 * 8 vs. Control * 0 and E2 * 48 vs. Control * 0 contrasts.
Select E2 * 8 vs. Control * 0 and E2 * 48 vs. Control * 0 from the contrasts list
Select Delete
We will now add contrasts for the other treatment conditions.
Add contrasts for E2+ICI vs. Control, E2+Ral vs. Control, and E2+TOT vs. Control following the steps outlined for E2 vs. Control
There should now be four contrasts added to the contrasts panel (Figure 4).
Figure 4. Fully configured contrasts for the tutorial
Select OK to add the contrasts to the ANVOA model
The Contrasts... button should now read Contrasts Included in the ANOVA dialog.
Select OK to perform the ANOVA
The result of the 3-way mixed model ANOVA is displayed in a new spreadsheet, ANOVA-3way (ANOVAResults) that is a child of the Breast_Cancer.txt spreadsheet. In ANOVAResults, each row represents a probe(set)/gene with the columns containing the results of the ANOVA (Figure 5).
Figure 5. Viewing the ANOVA Results spreadsheet. Probe(sets)/genes are on rows and the ANOVA results are on columns.
By default, the rows are sorted in acending order by the p-value of the first factor, which places the most significantly differentially expressed gene between different treatments at the top of the spreadsheet.
Each factor in the ANOVA adds p-value, F value, and SS value columns. F value is a ratio of signal to noise; high values indicate that the probe(set)/gene explains variation in the data set due to the factor. SS value is the sum of squares.
Each contrast in the ANOVA adds p-value, ratio, and fold-change columns. The p-value is calculated using log space. The ratio and fold change are calculated using linear space.
Sources of variation captured in the ANOVA can be viewed for the entire data set or for individual probe(sets)/genes.
Select View Sources of Variation from the Analysis section of the Gene Expression workflow
The Sources of Variation plot will open in a new tab (Figure 6).
Figure 6. Viewing the sources of variation plot. Non-random factors are included when ANOVA is run using the default REML modle.
This plot presents the signal to noise ratio accross all probe(sets)/genes for each of the non-random factors and interactions in the ANOVA model. The y-axis represents the average mean square or F ratio, the ANOVA measure of variance, for all the probesets. Each bar is a factor and random error is also included. If the factor has a greater mean F ratio than Error, the factor contrinbuted significant variation to the data set.
Note that Batch is not included as a factor. This is beacuse Batch is a random factor and accounted for by the ANOVA model.
The sources of variation for each probe(set)/gene can be viewed individually.
Right-click on a row header in the ANOVAResults spreadsheet
Select Sources of Variation from the pop-up menu
Genes without changes in expression are given a value of zero and are colored black. Up-regulated genes have positive values and are displayed in red. Down-regulated genes have negative values and are displayed in green. Each sample is represented in a row while genes are represented as columns. Dendrograms illustrate clustering of samples and genes. To learn more about configuring the hierarchical clustering heat map, see the user guide.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () from the command bar
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () to save the changes
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () to activate Rotate Mode
Select () to open the Configure__Plot Properties dialog
Select () to open the Configure__Plot Properties dialog
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The data files can be downloaded from Gene Expression Omnibus using accession number or by selecting this link - . To follow this tutorial, download the 32 .idat files (note that two .idat files are generated for each array) and unzip them on your local computer using 7-zip, WinRAR, or a similar program.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () plot the PCA scatter plot
Select ()
Select ()
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The plot will open in a new tab. For additional plots that can be invoked from the ANOVA results spreadsheet, see the user guide.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Gene Ontology (GO) enrichment analysis compares a gene list to lists of genes associated with biological processes, cellular compartments, and molecular functions to provide biological insights. Once a list of genes has been created, it is possible to see which GO terms the genes are associated with and whether any GO terms are significantly enriched in the gene list.
Select the E2 vs. Control spreadsheet from the spreadsheet tree
Select Gene Set Analysis from the Biological Interpretation section of the Gene Expression workflow
Select Next > to continue with GO Enrichment
Select Next > to continue with 1/E2_vs_Control (E2 vs. Control)
Select Next > to continue with default parameter settings
Select Next > to continue with the default mapping file
A new spreadsheet 1 (GO-Enrichment.txt) will open as a child spreadsheet of E2 vs. Control (Figure 1).
Figure 1. GO Enrichment results spreadsheet
GO terms are shown in rows and are sorted by ascending enrichment p-value.
To visualize the results, we can launch the Gene Ontology Browser.
Select View from the main tool bar
Select Gene Ontology Browser
The Gene Ontology Browser will open in a new tab (Figure 2).
Figure 2. Viewing GO enrichment results in the Gene Ontology Browser
The bar chart shows the GO terms with the highest enrichment scores for the gene list.
To learn more about GO enrichment and using the Gene Ontology Browser, please consult the Gene Ontology Enrichment tutorial.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
To analyze differences in methylation between our experimental groups, we need to create a list of deferentially methylated loci.
Select Create Marker List from the Analysis section of the Illumina BeadArray Methylation workflow
Select LCLs vs. B cells (Figure 1)
Figure 1. Creating a list of significantly differentially methylated loci
Leave Include size of the change selected and set to Change > 2 OR Change < -2
Leave Include significance of the change selected and set to p-value with FDR < 0.05
Select Create
Select Close to exit the list manager
The new spreadsheet LCLs vs. B cells (LCLs vs. B cells) will open in the Analysis tab.
It is best practice to occasionally save the project you are working on. Let's take the opportunity to do this now.
Select File from the main command toolbar
Select Save Project...
Specify a name for the project, we chose Methylation Tutorial, using the Save File dialog
Select Save to save the project
Saving the project saves the identity and child-parent relationships of all spreadsheets displayed in the spreadsheet tree. This allows us to open all relevant spreadsheets for our analysis by selecting the project file.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The list, LCLs vs B cells, includes differentially methylated loci for locations across the genome; however, in many cases we may want to focus on loci located in particular regions of the genome. To filter our list to include only regions of interest, we can use the annotations provided by Illumina and the interactive filter in Partek Genomics Suite.
Select LCLs_Vs_B_cells from the spreadsheet tree
Right-click on the Gene Symbol column
Select Insert Annotation (Figure 1)
Figure 1. Adding an annotation column to the ANOVA results
Select the Add as categorical option
Select Relation_to_UCSC_CpG_Island (Figure 2)
CpG islands are regions of the genome with an atypically high frequency of CpG sites. CpG islands and their surrounding regions (termed shelf and shore) include many gene promoters and altered methylation in these regions can have a disproportionate effect on gene expression. For example, hyper-methylation of promoter CpG islands is a common mechanism for down-regulating gene expression in cancer.
Figure 2. Adding chromosome location to ANOVA results
Select OK to add Relation_to_UCSC_CpG_Island as a column in next to 3. Gene Symbol
Select () from the quick action bar to save the ANOVA-2way (ANOVA Results) spreadsheet with the added annotation
Now, we can filter probes by their relation to CpG islands.
Select () from the quick action bar to invoke the interactive filter
Select 4. Relation_to_UCSC_CpG_Island for Column
For categorical columns, the interactive filter displays each category of the selected column as a colored bar. For 4. Relation_to_UCSC_CpG_Island, each bar represents one of the categories of the UCSC annotation . To filter out a category, left-click on its bar. Right clicking on a bar will include only the selected category. A pop up balloon will show the category label as you mouse over each bar.
Right-click the Island column to filter out other columns (Figure 3)
Figure 3. Using Interactive Filter tool to filter out probes by annotation. When pointed to a categorical column, the Interactive Filter tool summarises the content of the column by a column chart. Left-click to exclude a category (two columns were excluded, so they are grayed out), right-click to include only
The yellow and black bar on the right-hand side of the spreadsheet panel shows the fraction of excluded cells in black and included cells in yellow. Right-clicking this bar brings up an option to clear the filter.
Now that we have filtered out probes that are not in CpG islands, we will create a spreadsheet containing only these probes.
Right click on the LCLs vs. B cells spreadsheet in the spreadsheet tree panel (Figure 4)
Figure 4. Cloning a filtered spreadsheet creates a new spreadsheet with only the included cells
Select Clone
Rename the new spreadsheet LCLs_vs_B_cells_CpG_Islands using the Clone Spreadsheet dialog
Select mvalues from the Create new spreadsheet as a child spreadsheet: drop-down menu (Figure 5)
Select OK
Figure 5. Renaming and configuring filtered spreadsheet
Select () from the quick action bar to save the filtered spreadsheet
Specify a name for the spreadsheet, we chose LCLs_vs_B_cells_CpG_Islands, using the Save File dialog
Select Save to save the spreadsheet
You may want to save the project before proceeding to the next section of the tutorial.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Principal component analysis (PCA) can be performed to visualize clusters in the methylation data, but also serves as a quality control procedure; outliers within a group could suggest poor data quality, batch effects, mislabeled samples, or uninformative groupings.
Select PCA Scatter Plot from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Scatter Plot tab
Select 2. Cell Type for Color by
Select 3. Gender for Size by
Select () to enable Rotate Mode
Left click and drag to rotate the plot and view different angles (Figure 1)
Each dot of the plot is a single sample and represents the average methylation status across all CpG loci. Two of the LCLs samples do not cluster with the others, but we will not exclude them for this tutorial.
Figure 1. Principal components analysis (PCA) showing methylation profiles of the study samples. Each sample is represented by a dot, the axes are first three PCs, the number in parenthesis indicate the fraction of variance explained by each PC. The number at the top is the variance explained by the first three PCs. The samples are colored by levels of 2. Cell Type
Next, distribution of beta values across the samples can also be inspected by a box-and-whiskers plot.
Select Sample Box and Whiskers Chart from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Box and Whiskers tab
Each box-and-whisker is a sample and the y-axis shows beta-value ranges. Samples in this data set seem reasonably uniform (Figure 2).
Figure 2. Box and whiskers plot showing distribution of M-values (y-axis) across the study samples (x-axis). Samples are colored by a categorical attribute (Cell Type). The middle line is the median, box represents the upper and the lower quartile, while the whiskers correspond to the 90th and 10th percentile of the data
An alternative way to take a look at the distribution of beta-values is a histogram.
Select Sample Histogram from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Histogram tab
Again, no sample in the tutorial data set stands out (Figure 3).
Figure 3. Sample histogram. Each sample is a line, beta values are on the horizontal axis and their frequencies on the vertical axis. Two peaks correspond to two probe types (I and II) present on the MethylationEPIC array. Sample colors correspond to a categorical attribute (Cell Type)
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
To detect differential methylation between CpG loci in different experimental groups, we can perform an ANOVA test. For this tutorial, we will perform a simple two-way ANOVA to compare the methylation states of the two experimental groups.
Select Detect Differential Methylation from the Analysis section of the Illumina BeadArray Methylation workflow
A new child spreadsheet, mvalue, is created when Detect Differential Methylation is selected. M-values are an alternative metric for measuring methylation. β-values can be easily converted to M-values using the following equation: M-value = log2( β / (1 - β)).
An M-value close to 0 for a CpG site indicates a similar intensity between the methylated and unmethylated probes, which means the CpG site is about half-methylated. Positive M-values mean that more molecules are methylated than unmethylated, while negative M-values mean that more molecules are unmethylated than methylated. As discussed by Du and colleagues, the β-value has a more intuitive biological interpretation, but the M-value is more statistically valid for the differential analysis of methylation levels.
Because we are performing differential methylation analysis, Partek Genomics Suite automatically creates an M-values spreadsheet to use for statistical analysis.
Select 2. Cell Type and 3. Gender from the Experimental Factor(s) panel
Select Add Factor > to move 2. Cell Type and 3. Gender to the ANOVA Factor(s) panel (Figure 1)
Figure 1. ANOVA setup dialog. Experimental factors listed on the left can be added to the ANOVA model.
Select Contrasts...
Leave Data is already log transformed? set to No
Leave Report comparisons as set to Difference
For methylation data, fold-change comparisons are not appropriate. Instead, comparisons should be reported as the difference between groups.
Select 2. Cell Type from the Select Factor/Interaction drop-down menu
Select LCLs
Select Add Contrast Level > for the upper group
Select B cells
Select Add Contrast Level > for the lower group
Select Add Contrast (Figure 2)
Figure 2. Configuring ANOVA contrasts
Select OK to close the Configuration dialog
The Contrasts... button of the ANOVA dialog now reads Contrasts Included
Select OK to close the ANOVA dialog and run the ANOVA
If this is the first time you have analyzed a MethylationEPIC array using the Partek Genomics Suite software, the manifest file may need to be configured. If it needs configuration, the Configure Annotation dialog will appear (Figure 3).
Select Chromosome is in one column and the physical location is in another column for Choose the column configuration
Select Ilmn ID for Marker ID
Select CHR for Chromosome i
Select MAPINFO for Physical Position
Select Close
This enables Partek Genomics Suite to parse out probe annotations from the manifest file.
Figure 3. Processing the annotation file. User needs to point to the columns of the annotation file that contain the probe identifier as well as the chromosome and coordinates of the probe.
The results will appear as ANOVA-2way (ANOVAResults), a child spreadsheet of mvalue. Each row of the spreadsheet represents a single CpG locus (identified by Column ID).
Figure 4. ANOVA spreadsheet. Each row is a result of an ANOVA at a given CpG locus (identified by the Column ID column). The remaining columns contain annotation and statistical output
For each contrast, a p-value, Difference, Difference (Description), Beta Difference, and Beta Difference (Description) are generated. The Difference column reports the difference in M-values between the two groups while the Beta Difference column reports the difference in beta values between the two groups.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
To follow this tutorial, download the 32 .idat files (note that two .idat files are generated for each array) and unzip them on your local computer using 7-zip, WinRAR, or a similar program. The .idat files can be downloaded in a zipped folder using this link - Differential Methylation Analysis data set.
Store the 32 .idat files at C:\Partek Training Data\Methylation or to a directory of your choosing. We recommend creating a dedicated folder for the tutorial
Go to the Workflows drop down list, select Methylation (Figure 1)
Figure 1. Selecting the methylation workflow
Select Microarray Loci Methylation from the Methylation sub-workflows panel (Figure 2)
Figure 2. Selecting the Illumina BeadArray Methylation workflow
That will open Illumina BeadArray Methylation workflow (Figure 3)
Figure 3. Illumina BeadArray Methylation workflow
Select Import Illumina Methylation Data to bring up the Load Methylation Data dialog
Select Import human methylation 450/850 .idat files (Figure 4)
Figure 4. Selecting human methylation 450/850 .idat file type for import
Select OK
Select Browse... to navigate to the folder where you stored the .idat files
All .idat files in the folder will be selected by default (Figure 5).
Figure 5. Selecting .idat files to import
Select Add File(s) > to move the files to the idat Files to Process pane of the Import Illumina iDAT Data dialog (Figure 6)
Figure 6. Confirming selection of .idat files for import
Select Next >
The following dialog (Figure 7) deals with the manifest file, i.e. probe annotation file. If a manifest file is not present locally, it will be downloaded in the Microarray libraries directory automatically. The download will take place in the background, with no particular message on the screen and it may take a few minutes, depending on the internet connection. In the future, you may want to reanalyze a data set using the same version of the manifest file used during the initial analysis, rather than downloading an up-to-date version. To facilitate this, the Manual specify option in the Manifest File section allows you to specify a specific version. For this tutorial, we will leave this on the default settings.
Figure 7. Selecting manifest file and output file
By default the output file destination is set to the file containing your .idat files and the name matches the file folder name. The name and location of the output file can be changed using the Output File panel.
Select Customize to view advanced options for data normalization
In the Algorithm tab of the Advanced Import Options dialog (Figure 8), there are two filtering options and five normalization options available. The filters allow you to exclude probes from the X and Y chromosomes or based on detection p-value. In this tutorial, we have male and female samples, so we will apply the X and Y chromosome Filter. We will also filter probes based on detection p-value to exclude low-quality probes.
Select Exclude X and Y Chromosomes
Analysis of differentially methylated loci in humans and mice often excludes probes on the X and Y chromosomes because of the difficulties caused by the inactivation of one X chromosome in female samples.
Select Exclude probes using detection p-value and leave the default settings of 0.05 and 1 out of 16 samples.
We recommend using the default option for normalization; however, advanced users can select their preferred normalization method. Select the () next to each normalization option for details. If you want to import probe intensity, raw probe intensity, probe signals, raw probe signals, or anti-log probe intensity values, they can be added to the data import using the Outputs tab of the Advanced Import Options dialog. Probe intensities and raw probe intensities can be used for advanced troubleshooting purposes and antilog probe intensities can be used for copy number detection. The Outputs tab of the Advanced Import Options dialog also has an option to create NCBI GEO submission spreadsheets from your imported data. For this tutorial, we do not need to import any of these values or create GEO submission spreadsheets.
Figure 8. Advanced Import Options offers choice of normalization method and additional data outputs
Select OK to close the Advanced Import Options dialog
Select Import on the Import Illumina iDAT data dialog
The imported and normalized data will appear as a spreadsheet 1 (Methylation Tutorial) (Figure 9)
Figure 9. Viewing the imported methylation data in a spreadsheet
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Genomics Suite enables you to visualize each probe and compare the methylation between the groups at a single CpG site level.
Right click row 5_. SBNO2_ in the LCLs_vs_B_Cells_CpG_Islands spreadsheet
Select Browse to Location from the pop-up menu
Figure 1. Browsing to location from spreadsheet with differentially expressed genes
The Chromosome View tab will open, zoomed in to the selected CpG locus in SBNO2 (Figure 2).
Figure 2. Viewing location in Genome Viewer
The Chromosome View visualization is composed of a series of tracks corresponding to annotation files and data files.
RefSeq Transcripts 2017-05-02 (hg19) (+): transcripts coded by the positive strand
RefSeq Transcripts 2017-05-02 (hg19) (-): transcripts coded by the negative strand
Regions: by default, difference in methylation (M-value) between the groups
Heatmap (1/mvalue): M values for all the samples
Barchart (Methylation): methylation level in M value of the selected sample (to select a sample, click on a heat map)
Heatmap (Methylation Tutorial): Beta values for all the samples
Barchart (Methylation): methylation level in Beta value of the selected sample (to select a sample, click on a heat map)
Cytoband: cytobands of the current chromosome
Genomic Label: coordinates on the current chromosome
To modify a track, select it in the Tracks panel to bring up its configuration options panel below the Tracks panel. Let's modify a few tracks to improve our visualization of the data.
Select the Regions track, opens to Profile tab
Select Color tab
Set Color bars by to Difference (LCLs vs. B cells) (Description)
Select Apply to change
This will color regions by up or down methylated.
Select the Heatmap (1/mvalue)
Select Remove Track
Select Bar Chart (Methylation) located directly below the Regions track
Select Remove Track
We can now more clearly see the Difference in M values for the region in the Regions track, the heatmap of beta values in the Heatmap track, and the beta value for the loci of the selected sample in the Bar Chart track.
Select a sample on the heatmap to view its beta value in the Bar Chart track (Figure 3)
Figure 3. Modify the tracks of the Genome Viewer to facilitate visual analysis
The New Track button allows new tracks to be added to the viewer, while the Remove Track button removes the selected track from the viewer. Tracks can be reordered by selecting a track in the Tracks panel and dragging it up or down to move it in the list. In the Chromosome View, select () for selection mode and () for navigation mode. In navigation mode, left-click and draw a box on any track to zoom in. All tracks are synced and will zoom together. Zooming can also be controlled using the interface in the lower right-hand corner of the tab (). View can be reset to the whole chromosome level using reset zoom (). Searching for a gene or transcript in the position box will also zoom directly to its location.
The available tracks can be supplemented with a special annotation file that can be built using a UCSC annotation file as the basis. Building and viewing the UCSC annotation file is available as an optional section of the tutorial, Optional: Add UCSC CpG island annotations.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Each row of the spreadsheet (Figure 1) corresponds to a single sample. The first column is the names of the .idat files and the remaining columns are the array probes. The table values are β-values, which correspond to the percentage methylation at each site. A β-value is calculated as the ratio of methylated probe intensity over the overall intensity at each site (the overall intensity is the sum of methylated and unmethylated probe intensities).
Figure 1. Spreadsheet after .idat file import: samples on rows (Sample IDs are based on file names), probes on columns, cell values are functionally normalized beta values (default settings)
Before we can perform any analysis, the study samples need to be organized into their experimental groups.
Select Add Sample Attributes from the Import section of the Illumina BeadArray Methylation workflow
Select Add a Categorical Attribute from the Add Sample Attributes dialog (Figure 2)
Figure 2. Adding sample attributes. Adding Attributes from an Existing Column can be used to split file names into sections, based on delimiters (e.g. _, -, space etc.). Adding a Numeric or Categorical Attribute enables the user to manually specify sample attributes
Select OK
The Create categorical attribute dialog allows us to create groups for a categorical attribute. By default, two groups are created, but additional groups can be added.
Set Attribute name: to Cell Type
Rename the groups B cells and LCLs
Drag and drop the samples from the Unassigned list to their groups as listed in the table below
There should now be two groups with eight samples in each group (Figure 3).
Figure 3. Adding Cell Type attribute as a categorical group
Select OK
Select Yes from the Add another categorical attribute dialog
Set Attribute name: to Gender
Rename the groups Male and Female
Drag and drop the samples from the Unassigned list to their groups as listed in the table below
There should now be two groups with four samples in Male and twelve samples in Female (Figure 4).
Figure 4. Adding Gender attribute as a categorical group
Select OK
Select No from the Add another categorical attribute dialog
Select Yes to save the spreadsheet
Two new columns have been added to spreadsheet 1 (Methylation) with the cell type and gender of each sample (Figure 5).
Figure 5. Annotated beta values spreadsheet
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Pathway provides a visualization tool for pathway enrichment spreadsheets utilizing the KEGG database. This tutorial will illustrate:
Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.
The pathway enrichment analysis illustrated in this user guide uses the . This data set is also used in our .
Download and save the zipped project folder in an accessible location on your computer. The project folder for the tutorial will be created in the same location the zipped project folder is stored.
Import the project using the zipped project importer in Partek Genomics Suite.
Select File from the main toolbar
Select Import
Select Zipped Project...
Choose the zipped project folder, miRNA_tutorial_data
The project will open with three spreadsheets:
1. Affy_miR_BrainHeart_intensities,
2. Affy_HuGeneST_BrainHeart_GeneIntensities,
3. ANOVAResults gene.
An Illumina-type project file (.bsc format) can be imported in Illumina’s GenomeStudio® (please note: to process 450K chips, you need GenomeStudio 2010 or newer) and exported using the Partek Methylation Plug-in for GenomeStudio. For more information on the plug-in, please see the . The plug-in creates six files: a Partek project file (*.ppj), an annotation file (*.annotation.txt), files containing intensity values (*.fmt and *.txt), and files containing β-values (*.fmt and *.txt) (Figure 1).
Figure 1. Output of Partek Methylation Plug-in for GenomeStudio
To load all the files automatically, open the .ppj file as follows.
Select Methylation from the Workflows drop-down menu
Select Illumina BeadArray Methylation from the Methylation sub-workflows section
Select Import Illumina Methylation Data from the Import section
Select Load a project following Illumina GenomeStudio export from the Load Methylation Data dialog
Partek Genomics Suite software can view annotation .BED files as tracks in the Genome Viewer. We can add a CpG islands track to the Genome Viewer using the UCSC Genome Browser CpG islands annotation.
Go to
Select Table Browser under Tools in the main command bar of the webpage (Figure 1)
Figure 1. Navigating to the Table Browser at the UCSC Genome Browser website
Configure the Table Browser page as shown (Figure 2)
Figure 2. Configuring the Table Browser to output CpG Islands BED file
Set assembly to Feb. 2009 (GRCh37/hg19)
Set group to Regulation
Set track to CpG Islands
Set table to cpgIslandExt
Set output format to BED
Set output file to cpg.bed
Select get output
The Output cpgIslandExt as BED page will open.
Select get BED to download a compressed folder containing the BED file
Unzip the file using 7-Zip, WinRAR, or a similar program of your choice to a location you will be able to find
Next, we can import the BED file into Partek Genomics Suite.
Select Genomic Database... under Import under File in the main toolbar in Partek Genomics Suite (Figure 3)\
Figure 3. Importing the CpG Islands map BED file
Select the file cpg.bed
The BED file will open as a new spreadsheet.
Change the spreadsheet name to UCSC CpG Island Annotation and save it
The approach described in previous sections relies on ANOVA to detect differentially methylated CpG sites and takes individual sites as a starting point for interpretation. Since ANOVA compares M values at each site independently, this strategy is robust to type I/type II probe bias.
An alternative could be to first summarize all the probes belonging to a CpG island region (i.e. island, N-shore, N-shelf, S-shore, S-shelf) and then use ANOVA to compare regions across the groups. Since the summarization will include both type I and type II probes, you may want to split the analysis in two branches and analyze type I and type II probes independently. To do this, we need to annotate each probe as type I or type II.
Select the mvalue spreadsheet
Select Transform from the main toolbar
Select Create Transposed Spreadsheet... from the Transform drop-down menu (Figure 1)
Figure 1. Creating a transposed spreadsheet
Select Sample ID for Column: and numeric for Data Type:
Select OK
A new temporary spreadsheet will be created with a row for each probe and columns for each sample.
Right-click on column 1. ID to bring up the pop-up menu
Select Insert Annotation
Select Add as categorical
Select Infinium_Design_Type and UCSC_CpG_Islands_Name from the Column Configuration options (Figure 2)
Figure 2. Adding Infinium design type and CpG island annotations
Select OK to add the Inifinium design type and UCSC CpG island name as categorical columns on the spreadsheet
Now, we can use the interactive filter to create separate spreadsheets for type I and type II probes.
Select 2. Infinium_Design_Type from the drop-down menu if not selected by default
Left-click the type I column to exclude it
Right-click the temporary spreadsheet in the spreadsheet tree to bring up the pop-up dialog
Select Clone... (Figure 3)
Figure 3. Creating a probe list with only Infinium type II probes
Name the new spreadsheet female_only_typeII_probes
Select OK
Save the created spreadsheet, we chose the file name female_only_typeII_probes
Repeat process to create a spreadsheet for type I probes
The temporary spreadsheet is no longer needed so we can close it.
We can use these spreadsheets to generate lists of M values at CpG island regions
Select spreadsheet female_only_typeII_probes
Select Stat from the main toolbar
Select Column Statistics... under Descriptive (Figure 4)
Figure 4. Selecting column statistics
Add Mean to the Selected Measure(s) panel
Select Group By and set it to 3. UCSC_CpG_Islands_Name (Figure 5)
Figure 5. Configuring column statistics
Select OK
The new temporary spreadsheet has one CpG island region per row (Figure 6), samples on columns, and the values in the cells represent the mean of M values of all the CpG probes in the region.
Figure 6. New spreadsheet with average M values for probes at each CpG island; probes not at CpG islands are collected into the first row "- Mean"
Note the first row, with label “– Mean”. It corresponds to all the probes that map outside of UCSC CpG islands. As it is not needed for the downstream analysis, we will remove it.
Right-click on the row header for Mean
Select Delete to remove the row
The final step is to transpose the data back to its original orientation.
Select Transform from the main toolbar
Select Create Transposed Spreadsheet... from the Transform drop-down menu
Select 2. Level for Column: and numeric for Data Type:
Select OK
The layout of the new transposed spreadsheet is as follows: one sample per row with CpG island regions on columns; cell entries correspond to mean methylation status of the region (Figure 7). The column with a blank value for the column header is the average of all probes not associated with CpG island regions. You can delete this column if you like.
Figure 7. Spreadsheet with average M values of probes in each CpG island for each sample
Right-click the transposed spreadsheet, 2_transpose
Select Save as... from the pop-up menu
Name it mvalues_typeII_probes_CpG_islands
The mvalues_typeII_probes_CpG_islands spreadsheet can be used as a starting point for ANOVA and other analyses. You can also repeat the steps above to create an equivalent spreadsheet for type I probes.
To perform gene set and pathway analysis, we need to create a list of genes that overlap with differentially methylated CpG loci.
Select LCLs_vs_B_cells_CpG_Islands in the spreadsheet tree
Select Find Overlapping Genes from the Analysis section of the workflow
The Output Overlapping Features dialog will open (Figure 1). This dialog allows you to choose the annotation database that will define where gene are located. By default the promoter region will be defined as 5000 base pairs upstream and 3000 base pairs downstream from the transcription start site.
Figure 1. Selecting Finding Overlapping Genes form the main toolbar
Select Ensembl Transcripts release 75 from the Report regions from the specified database options
You can select a name for the new list, we have named it gene-list
Select OK
A new spreadsheet will be created as a child spreadsheet (Figure 2)
Figure 2. Annotating the differentially methylated CpG loci with genes
Partek Genomics Suite offers several tools to help interpret this list of genes. First, let's look at Gene Set Analysis.
Select Gene Set Analysis from the Biological Interpretation section of the Illumina BeadArray Methylation workflow
Select GO Enrichment for Select the method of analysis
Select Next >
Select 1/mvalue/lcls_vs_b_cells_cpg_islands/gene-list (gene-list.txt) for the source spreadsheet
Select Next >
Select Invoke gene ontology browser on the result and leave the rest of the options set to defaults for Configure the parameters of the test (Figure 3)
Figure 3. Configuring the parameters of the test
Select Next >
Select Default Mapping File for Select the method of mapping genes to genes sets
Select Next >
A new spreadsheet will be created with categories ranked by enrichment score and the Gene Ontology Browser will launch to graphically display the results of the spreadsheet (Figure 4). The results show which gene sets are over represented in the list of genes overlapped by differentially regulated CpG loci between the experimental and control groups.
Figure 4. GO enrichment browser showing gene groups overrepresented in the list of genes which overlap with differentially methylated CpG loci
To get a better idea whether genes associated with these GO terms have increated or decreased methylation, we can view the Forest Plot.
Select the Forest Plot tab
Go terms are listed by the number of significantly up-regulated genes, with the percent up-regulated and down-regulated shown in red to green bars. Here, we see that most GO terms show increased methylation in their associated genes (Figure 4).
Figure 5. Gene Ontology Forest Plot
Next, we can perform Pathway Analysis to see which pathways are over represented in the gene overlapped by differentially regulated CpG loci.
Select gene-list from the spreadsheet tree
Select Pathway Analysis from the Biological Interpretation section of the Illumina BeadArray Methylation workflow
Select Pathway Enrichment for Select the method of analysis
Select Next >
Select 1/mvalue/lcls_vs_b_cells_cpg_islands/gene-list (gene-list.txt) for the source spreadsheet
Select Next >
Leave the default selections for the Configure parameters of the test panel
Select Next >
Leave the default selections for the Result File and Select the parameters panels
Select Next > to run the analysis
The Pathway-Enrichment spreadsheet will be added to the spreadsheet tree in Partek Genomics Suite and the Partek® Pathway™ software will open to provide visualization of the most significantly enriched pathway as a pathway diagram (Figure 5). The color of the gene boxes reflects p-values of the associated differentially methylated CpG loci (bright orange is insignificant, blue is highly significant). The Color by option can be changed another column from the gene-list.txt spreadsheet, such as Difference.
Figure 6. : Partek Pathway illustrating one of the pathways overrepresented in the list of genes overlapping the differentially methylated CpG sites.
The zipped project file contains several prepared files used in this analysis as well as the annotation information for the BeadChip. The zipped file also contains a Partek project file (.ppj).
After downloading the file, go to File > Import > Zipped project... and browse to GO_Enrichment.zip on your local drive
Partek Genomics Suite will automatically unzip the file, read the .ppj file, open and annotate all spreadsheets (Figure 1). The parent spreadsheet (GSE8479-AVGSignal) contains the original intensity data. The first child spreadsheet (ANOVAResults) contains the results of differential gene expression analysis from a 3-way ANOVA. The second child spreadsheet (Gene_List.txt) is a list of significantly differentially expressed genes. When working with your own data, you will need to detect differentially expressed genes and create a gene list yourself.
Figure 1. Viewing the Gene List spreadsheet
Gene ontology (GO), enrichment analysis has been incorporated into the gene expression, microRNA expression, exon, copy number, tiling, ChIP-Seq, RNA-Seq, miRNA-Seq and methylation workflows. The Gene Ontology Consortium provides an excellent overview for new and experienced users of GO analysis. In brief, the common nomenclature of genes and gene products has been used to group genes into a functional hierarchy. This enables analyses to be compared across all types of genomic data, even data from different species. A broader understanding of experimental results is possible by grouping genes of interest into biological processes, cellular components and molecular functions of the genes. With the GO enrichment tool in Partek® Genomics Suite® you can take a list of genes (e.g. significantly differentially expressed genes) and see how they group in the functional hierarchy. This is analogous to going from looking at individual trees (genes) to see how the whole forest (gene ontology) is organized.
This tutorial illustrates how to:
Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.
This tutorial will provide a step-by-step guide to performing GO enrichment analysis. The data set used is based on 51 subjects run on the Illumina Human Ref-8 BeadChip platform. Twenty-six of the subjects were categorized as "Young" with an age range of 18 to 28. The other 25 subjects were categorized as "Old" with an age range of 65 to 84. Skeletal muscle, a type of striated muscle tissue, was obtained via biopsy from each subject. The total RNA was extracted from the skeletal cells, prepared and run on the BeadChips producing the data that is used for this tutorial.
The paper this data is based on can be found at .
can be downloaded by going to Help > On-line Tutorials on the main menu toolbar within the Partek Genomics Suite software. Download the zipped file and store it on your local disk drive. There is no need to manually unzip the directory.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The significant CpG loci detected in the previous step actually form a methylation signature that differentiates between LCLs and B cells. We can build and visualize this methylation signature using clustering and a heat map.
Select the LCLs_vs_Bcells_CpG_Islands spreadsheet in the spreadsheet pane on the left
Select Cluster Based on Significant Genes from the Visualization panel of the Illumina BeadArray Methylation workflow
Select Hierarchical Clustering for Specify Method (Figure 1)
Figure 1. Selecting Heirarchical Clustering for clustering method
Select OK
Verify that LCLs_vs_Bcells_CpG_Islands is selected in the drop-down menu
Verify that Standardize is selected for Expression normalization (Figure 2)
Figure 2. Selecting spreadsheet and normalization method for clustering
Select OK
The heat map will be displayed on the Hierarchical Clustering tab (Figure 3).
Figure 3. Hierarchical clustering with heat map invoked on a list of significant CpG loci
The experimental groups are rows, while the CpG loci from the LCLs vs B cells spreadsheet are columns. Methylation levels are compared between the LCLs and B cells groups. CpG loci with higher methylation are colored red, CpG loci with lower methylation are colored green. LCLs samples are colored orange and B cells samples are colored red in the dendrogram on the the left-hand side of the heat map.
Although the 450K and MethylationEPIC arrays were initially designed to analyze DNA methylation, they are essentially a dense SNP array and can be used for copy number analysis (Feber et al. 2014). The probe intensity data is easily parsed from the idat files by using the Additional Probe Data Spreadsheet Selection dialog (Figure 1) when importing the raw data. Examining the raw intensity data can also be useful for QA/QC purposes.
Follow the steps for importing Illumina methylation data detailed in until you reach the Import Illumina iDAT Data dialog with Manifest File and Output File panels (Figure 1).
Figure 1. Customizing output during data import
Select Customize... to open the Advanced Import Options dialog
Choose No normalization in the Normalization section of the Algorithm tab
Select the Outputs tab (Figure 2)
Figure 2. Selecting additional probe data to include during data import
Detection p-values. This is the confidence score that the signal of a probe was significantly higher than the background defined by negative control probes. Selecting this checkbox produces a spreadsheet ending with '_detectionp' in addition to the spreadsheet containing beta values. Each row of the _detectionp spreadsheet will be a different sample and the sample names will end in '_detectionp'. This spreadsheet can be used to filter out probes that do not show signal above background.
Probe Intensity. This is the sum of the methlyated and unmethylated intensities per probe. Selecting this checkbox produces a spreadsheet ending with ‘_probe’ in addition to the spreadsheet containing beta values. Each row of the _probe spreadsheet will be a different sample and the file names will also end in ‘_probe.’ The probe intensity values will be log2 transformed by default (note that the beta values are not log2 transformed).
Probe Signal. This option will become available if Probe Intensity is selected. Selecting this checkbox produces a spreadsheet ending with ‘_probe.’ The methylated and unmethylated intensities are shown on separate rows for each sample, in addition to the summed values. The sample names will end in ‘_meth’ or ‘_unmeth’ for methylated and unmethylated values, respectively. The probe intensity values will be log2 transformed by default.
Raw Probe Intensity. This is the sum of the raw red and green signal intensities per probe. Selecting this checkbox produces a spreadsheet ending with ‘_raw’ in addition to the spreadsheet containing beta values. Each row of the spreadsheet will be a different sample and the file names will also end in ‘_raw.’ The raw probe intensity values will be log2 transformed by default.
Raw Probe Signal. This option will become available if Raw Probe Intensity is selected. Selecting this checkbox produces a spreadsheet ending with ‘_raw.’ The red and green intensities will be shown on separate rows for each sample, in addition to the summed values. The sample names will end in ‘_red’ or ‘_green’ for red and green values, respectively. The raw probe intensity values will be log2 transformed by default.
Antilog Probe Intensity Values. Selecting this checkbox will show the probe intensity data without log2 transformation.
Create NCBI GEO Submission Spreadsheets. Generates matrix processed and matrix signal intensities spreadsheets for GEO submission.
How you proceed depends on your study design. Here is an example series of steps to prepare the tutorial data set for copy number analysis:
Select Probe Intensity and Antilog Probe Intensity Values (Figure 2)
Select OK to close the Advanced Import Options dialog
Select Import to import the data and perform the selected normalization method
Select the (_probe) spreadsheet from the spreadsheet tree
Delete any samples with _detectionp names
Select Transform from the main toolbar
Select Normalize to baseline
Configure the Normalize to Baseline 1 dialog as shown (Figure 3)
Select Use control set form this spreadsheet
Set Control Category to B cells
Select Ratio to baseline from the Normalization Method section
Select After ratio apply log base 2
Select New Spreadsheet from the Configure Output section
Figure 3. Configuring normalize to baseline
Select OK to generate the spreadsheet
This spreadsheet contains copy number values per probe in log2 space (i.e. diploid = 0). Prior to performing copy number analysis, you can normalize for local GC abundance.
Select Transform
Select Adjust Based on Local GC Content...
Click OK to run Local GC Adjustment (Figure 4)
Figure 4. Adjusting for local GC content
The GC adjusted spreadsheet is the starting spreadsheet for copy number analysis. You can now switch over to the Copy number workflow, skip the Create copy number step, and begin with the Detect amplifications and deletions step. Consult the user's guide for the copy number workflow for subsequent steps.
Feber A, Guilhamon P, Lechner M, et al. Using high-density DNA methylation arrays to profile copy number alterations. Genome Biology. 2014;15(2):R30. doi:10.1186/gb-2014-15-2-r30.
Sample ID | Cell Type |
---|---|
Sample ID | Gender |
---|---|
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
For this region list, you can also calculate the average beta values for the probes in each island per sample and detect differential methylated CpG islands regions. Detailed information on how to get average beta value for each CpG can be found in the Determining the average values for a region list section of .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () to launch the interactive filter
Close the temporary spreadsheet by selecting it in the file tree and selecting ()
Close the source temporary spreadsheet by selecting it in the spreadsheet tree and selecting ()
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The Pathway-Enrichment spreadsheet can also be viewed in Partek Pathway by switching to the Pathway-Enrichment section of the menu tree on the left-hand side of the window. From the spreadsheet view, you can select a pathway name to visualize that pathway. Alternatively, you can open a pathway visualization in Partek Pathway from the Pathway-Enrichment spreadsheet in Partek Genomics Suite by right-clicking on a row and selecting Show pathway... from the pop-up menu. Please note that if you have closed Partek Pathway and have reopened it, you will need to import a gene list if you want to color the visualization by attributes form the gene list. For more information about using Partek Pathway, checkout our .
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Information about the different output options can be found by selecting the adjacent () icon.
Create sample attributes and assign samples to the groups as described in
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
GSM2452106_200483200025_R04C01
B cells
GSM2452107_200483200021_R01C01
B cells
GSM2452108_200483200021_R02C01
B cells
GSM2452109_200483200025_R06C01
B cells
GSM2452110_200483200025_R07C01
B cells
GSM2452111_200483200021_R08C01
B cells
GSM2452112_200483200021_R06C01
B cells
GSM2452113_200483200021_R04C01
B cells
GSM2452114_200483200025_R01C01
LCLs
GSM2452115_200483200025_R03C01
LCLs
GSM2452116_200483200021_R03C01
LCLs
GSM2452117_200483200025_R05C01
LCLs
GSM2452118_200483200025_R02C01
LCLs
GSM2452119_200483200021_R07C01
LCLs
GSM2452120_200483200021_R05C01
LCLs
GSM2452121_200483200025_R08C01
LCLs
GSM2452106_200483200025_R04C01
Female
GSM2452107_200483200021_R01C01
Female
GSM2452108_200483200021_R02C01
Male
GSM2452109_200483200025_R06C01
Female
GSM2452110_200483200025_R07C01
Female
GSM2452111_200483200021_R08C01
Female
GSM2452112_200483200021_R06C01
Female
GSM2452113_200483200021_R04C01
Male
GSM2452114_200483200025_R01C01
Female
GSM2452115_200483200025_R03C01
Female
GSM2452116_200483200021_R03C01
Male
GSM2452117_200483200025_R05C01
Female
GSM2452118_200483200025_R02C01
Female
GSM2452119_200483200021_R07C01
Female
GSM2452120_200483200021_R05C01
Female
GSM2452121_200483200025_R08C01
Male
Before performing pathway enrichment, we need to create a gene list from the ANOVA results.
Select Gene Expression from the workflows drop-down menu
Select the ANOVAResults gene spreadsheet
Select Create Gene List from the Analysis section of the Gene Expression workflow
Select Brain vs. Heart from the List Manager dialog (Figure 1) leaving the other options as defaults
Select Create
Figure 1. Configuring the list manager dialog
A new list of 420 genes will be created as a child spreadsheet of 1 (ANOVAResults gene).
Select Close to exit the List Manager dialog
Select the new gene list, Brain vs. Heart
Select Pathway Analysis from the Biological Interpretation section of the Gene Expression workflow
Select Next > to continue with Pathway Enrichment
Pathway Enrichment is the only option available for a gene list. To learn more about the other option, Pathway ANOVA, see the Gene Ontology ANOVA tutorial, which follows the same procedure as Pathway ANOVA.
Select Next > to continue with the Brain vs. Heart spreadsheet
Select Next > to continue with default settings for Fisher's Exact test
Select Next > to continue with Homo sapiens and 4. Gene Symbol as parameters
Partek Pathway will now open. If this is your first time using Partek Pathway on the selected species, Partek Pathway will automatically download the KEGG information needed for the analysis. Once the pathway enrichment calculation has been performed, a new spreadsheet, Pathway-Enrichment.txt, will be added as a child spreadsheet of Brain vs. Heart and Partek Pathway will launch (Figure 2).
Figure 2. Partek Pathway displaying the most significantly enriched pathway from the gene list
The pathway currently displayed has the highest enrichment score. Both Partek Genomics Suite and Partek Pathway offer options for analyzing the results of pathway enrichment analysis. The next two sections of the user guide will show the options for analyzing the results of pathway enrichment in each program.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Now that the data has been imported, we need to make a few changes to the data annotation before analysis.
Notice that the Sample ID names in column 1 are gray (Figure 1). This indicates that Sample ID is a text factor. Text factors cannot be used as a variable in downstream analysis so we need to change Sample ID to a categorical factor.
Figure 1. Viewing the imported data in a spreadsheet
Right-click on the column header to invoke the pop-up menu
Select Properties (Figure 2)
Figure 2. Changing column properties
Configure the Properties of Column 1 in Spreadsheet 1 dialog as shown (Figure 3) with Type set to categorical and Attribute set to factor
Figure 3. Changing column 1 properties
Select OK
The samples names in column 1 are now black, indicating that they have been changed to a categorical variable. Next, we will add attributes for grouping the data.
From the RNA-seq workflow panel, select Add Sample Attributes to bring up the Add Sample Attributes dialog (Figure 4)
Figure 4. Add Sample Attributes dialog
Select Add a Categorical Attribute
Select OK to bring up the Create categorical attribute dialog
Creating a categorical sample attribute allows us to group samples. This is useful for designating samples as replicates, as members of an experimental group, or as sharing a phenotype of interest. In this tutorial, we have four different samples from different tissues and different donors, but to illustrate the available statistical analysis options, we need to divide the samples into two groups: muscle (Heart and Muscle) and not muscle (Brain and Liver).
Set Attribute name: as Tissue
Rename Group 1 to muscle and Group 2 to not muscle
Select and drag the samples from the Unassigned panel to the correct group panel (Figure 5)
Figure 5. Creating a categorical attribute
Select OK
Select No from the Add another attribute? dialog
Select Yes from the Save spreadsheet 1 dialog
The attribute will now appear as a new column in the RNA-seq spreadsheet with the heading Tissue and the groups muscle and not muscle.
The next available step in the Import panel of the RNA-seq workflow is Choose Sample ID Column_._ Verifying the correct column is designated the Sample ID becomes particularly important when data from multiple experiments is being combined.
Select Choose Sample ID Column from the Import panel of the RNA-Seq workflow
Select OK (Figure 6)
Figure 6. Choosing the correct column as Sample ID
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
We are now ready to measure gene expression in our dataset. To do this, we will use the mRNA quantification task in the Analyze Known Genes section of the RNA-Seq workflow. mRNA quantification creates spreadsheets showing expression at exon, transcript, and gene levels and reports raw and normalized reads for each sample.
Please note that the normalization method used by Partek Genomics Suite is Reads Per Kilobase per Million mapped reads (RPKM) (Mortazavi et al. 2008). In brief, this normalization method counts total reads in a sample, divides by one million to create a per million scaling factor for each sample; then divides the read counts for the feature (exon, transcript, or gene) by the per million scaling factor to normalize for sequencing depth and give a reads per million value; and finally divides reads per million values by the length of the feature (exon, transcript, or gene) in kilobases to normalize for feature size.
Select 1 (RNA-Seq) from the spreadsheet tree
Select mRNA quantification in the Analyze Known Genes section of the RNA-seq workflow
The RNA-Seq Quantification dialog will appear (Figure 1).
Select RefSeq Transcripts 2017-05-02 from the mRNA section of the Specify a database of genomic features to quantify panel of the dialog
Your choices in the Configure the test panel of the dialog depend on the design and aims of your experiment. A detailed description of each option can be viewed by selecting the () icon next to it.
For Strand-specificity: select No
Your choice here depends on the method used for sample preparation. A directional mRNA-seq sample preparation protocol only synthesizes the first strand of cDNA whereas other methods reverse transcribe the mRNA into double-stranded cDNA. If double-stranded cDNA has been synthesized, the sequencer reads sequences from both the forward and reverse strands but does not discriminate between them, eliminating strand information. When strand information is preserved, it is possible for paired-end sequences to come from a combination of the forward and reverse strands. If in doubt, select Auto-detect from the drop-down list. The data for this tutorial did not preserve strand information so we selected No.
For In the gene-level result report intronic reads as compatible with the gene?, select No
Selecting Yes would include intronic reads in the gene-level results, which might be useful for discovering unannotated transcripts for known genes, and also includes introns in the RPKM calculation for the gene-level results.
For Require strict paired-end compatibility select No
Selecting Yes would require that two alignments form the same read must map to the same transcript to be considered compatible. However, the data set used in this tutorial consists of single-end reads so this option is unnecessary.
For report results with no reads from any sample? select No
Selecting Yes would include all the genes/transcripts/exons in the transcriptome, even if there are no reads for that feature from any sample.
Make sure Report unexplained regions with more than ___ reads is selected and specify 5 as the number of reads
This option will create a spreadsheet that includes all regions with a specified number of reads that map to the genome, but not to any feature included in the selected database of genomic features.
Select Report exon-level results
If selected, spreadsheets will be created describing expression at the exon level.
Your RNA-Seq Quantification dialog should now be configured as shown (Figure 1). Descriptions of the spreadsheets that can be created by mRNA Quantification can be viewed by selecting Describe results to bring up the Quantification Result Help dialog.
Figure 1. Configuring the RNA-Seq Quantification dialog
Select OK to perform the RNA-Seq quantification
Reads will now be assigned to individual transcripts of a gene based on the Expectation/Maximization (E/M) algorithm (Xing, et al. 2006). In Partek Genomics Suite software, the E/M algorithm is modified to accept paired-end reads, junction aligned reads, and multiple aligned reads if these are present in your data. For a detailed description of the E/M algorithm, refer to the RNA-Seq white paper (Help > On-line Tutorials > White Papers). Several spreadsheets containing the analyzed results will be generated. Progress bars in the lower left-hand corner RNA-Seq Quantification window and the main window will update as the data is analyzed.
If you have not disabled it, the the Quantification Result Help dialog will appear. Select Close
The Analysis tab now shows the spreadsheets created by mRNA Quantification in the spreadsheet tree as a child spreadsheet of 1 (RNA-seq) (Figure 2).
Figure 2. Viewing the results of mRNA Quantification
The __reads and _rpkm_** spreadsheets**
Data on features - genes, transcripts, and exons - are presented before and after normalization as _reads and _rpkm spreadsheets. In this tutorial, we have created exon_reads, exon_rpkm, gene_reads, gene_rpkm, transcript_reads, and transcript_rpkm spreadsheets.In these spreadsheets, samples are listed one per row and the normalized counts of the reads mapped to features are in columns (Figure 2).
The _reads and _rpkm spreadsheets can be used for data analysis. Sample grouping can be visualized using PCA. Select View > Scatter Plot from the toolbar or press on the quick action bar to create a PCA plot from the selected spreadsheet. See Exploring gene expression data for an example of using PCA plots for data analysis or consult Chapter 7 of the Partek User's Manual for a detailed introduction to PCA. With replicates in a sample group, you would also be able to use the _rpkm spreadsheet to perform differential expression analysis using ANOVA.
The _transcripts_** spreadsheet**
The transcripts spreadsheet lists a transcript in each row.
It is possible to derive basic information from the RNA-Seq_result.transcripts spreadsheet about differential and alternative splicing between your samples even if you don’t have replicates using a simple chi-squared or log-likelihood tests because each sample is represented only once and we can assume a null hypothesis that the transcripts are evenly distributed across all samples. However, the power of Partek Genomics Suite software resides in the implementation of a mixed-model ANOVA that can handle unbalanced and incomplete datasets, nested designs, numerical and categorical variables, any number of factors, and flexible linear contrasts when you do have biological replicates.
The _unexplained_regions_ spreadsheet
The contents of this spreadsheet are explained in more detail in a later section of the tutorial - Analyzing the unexplained regions spreadsheet.
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature, 2008; 5: 621-8.
Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C: An expectation-maximization algorithm for probalisitic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 2006, 34: 3150-3160.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
RNA-Seq is a high-throughput sequencing technology used to generate information about a sample’s RNA content. Partek Genomics Suite offers convenient visualization and analysis of the high volumes of data generated by RNA-Seq experiments.
This tutorial illustrates:
Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on Our support page to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.
In this tutorial, you will analyze an RNA-Seq experiment using the Partek Genomics Suite software RNA-Seq workflow. The data used in this tutorial was generated from mRNA extracted from four diverse human tissues (skeletal muscle, brain, heart, and liver) from different donors and sequenced on the Illumina® Genome Analyzer™. The single-end mRNA-Seq reads were mapped to the human genome (hg19), allowing up to two mismatches, using Partek Flow alignment and the default alignment options. The output files of Partek Flow are BAM files which can be imported directly into Partek Genomics Suite 7.0 software. BAM or SAM files from other alignment programs like ELAND (CASAVA), Bowtie, BWA, or TopHat are also supported. This same workflow will also work for aligned reads from any sequencing platform in the (aligned) BAM or SAM file formats.
Data and associated files for this tutorial can be downloaded by going to Help > On-line Tutorials from the Partek Genomics Suite main menu or using this link - RNA-Seq Data Analysis tutorial files. Once the zipped data directory has been downloaded to your local drive:
Unzip the downloaded files to C:\Partek Training Data\RNA-seq or to a directory of your choosing. Be sure to create a directory or folder to hold the contents of the zip file
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Pathway enrichment generates a results spreadsheet, Pathway-Enrichment.txt, visible in both Partek Genomics Suite (Figure 1) and in Partek Pathway.
Figure 1. The pathway enrichment spreadsheet is visible in both Partek Genomics Suite (shown here) and Partek Pathway
The spreadsheet includes 13 columns with information for each pathway represented in the source gene list.
1. Pathway Name - the name of the KEGG pathway
2. Database - the source database for the pathway annotation
3. Enrichment score - the negative natural log of the enrichment p-value derived from the contingency table (Fisher's Exact test) or the Chi-squared test
4. Enrichment p-value - the enrichment p-value derived from the contingency table (Fisher's Exact test) or the Chi-squared test
5. % genes in pathway that are present - the percentage of genes from the pathway that are present in the source gene list
6. Tissue score, 7. Replicate score, 8. Brain vs. Heart score - for each factor, interaction, and contrast in the ANVOA results spreadsheet, a separate score is calculated. This is derived form the negative log (base 10) of the average p-value for genes within the pathway for each factor. A high score indicates that the genes that fall into the pathway have a low p-value for the given factor.
9. # genes in list, in pathway - number of genes from the list in the pathway
10. # genes not in list, in pathway - number of genes from the pathway, not in the list
11. # genes in list, not in pathway - number of genes in list, not in the pathway
12. # genes, not in list, not in pathway - number of genes not in the pathway or the list that are included in KEGG database pathways for the species
13. Pathway ID - KEGG pathway ID
In Partek Genomics Suite, we can view several new options that are available for each pathway (row) in the Pathway-Enrichment.txt spreadsheet.
Right-click the row header of any row in the Pathway-Enrichment.txt spreadsheet (Figure 2)
Figure 2. The Pathway-Enrichment.txt spreadsheet in Partek Genomics Suite
The new options include:
Export genes in pathway, which creates a child spreadsheet of Pathway-Enrichment.txt that contains all the genes from the selected pathway(s) (Figure 3). This new spreadsheet includes gene symbols and their pathway.
Figure 3. Spreadsheet with all genes in pathway. Includes gene symbols and pathway.
Export genes in list and in pathway, which creates a child spreadsheet of Pathway-Enrichment.txt that contains the genes from your list that are present in the selected pathway(s) (Figure 4). This new spreadsheet includes gene symbols and their pathway.
Figure 4. Spreadsheet with genes only in list and pathway. Includes gene symbols and pathway.
Create Gene List, which creates a new child spreadsheet of the ANOVA results spreadsheet that contains the genes from your list that are present in the selected pathway(s) (Figure 5). This new spreadsheet includes all information for each gene from the ANOVA results spreadsheet. However, this list does not indicate the pathway of each gene.
Figure 5. Spreadsheet with genes in list and pathway. Includes all information from ANOVA results for each gene.
Show Pathway, which opens the selected pathway map in Partek Pathway.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
We will be using the RNA-Seq workflow to analyze RNA-Seq data throughout this tutorial. The commands included in the RNA-Seq workflow are also available form the command toolbar, but may be labeled differently.
Select the RNA-Seq workflow by selecting it from the Workflow drop-down menu in the upper right-hand corner of the Partek Genomics Suite window (Figure 1)
Figure 1. Selecting the RNA-seq workflow
The Partek Genomics Suite software can import next generation sequencing data that has been aligned to a reference genome. Two standard types of alignment formats can be imported: .BAM and .SAM. It is also possible to concert ELAND .txt files to .BAM files with the converter found in the Tools menu in the main command bar. The data used in this tutorial was aligned using the Partek® Flow® software and saved as .BAM files.
To import the .BAM files, select Import and Manage Samples from the Import section of the RNA-Seq workflow. The Sequence Import dialog box will open (Figure 2)
Figure 2. Importing .BAM files
Select BAM Files (*.bam) from the Files of type drop-down menu if not set by default
Use the file browser panel on the left-hand side of the Sequence Dialog or select Browse... to navigate to the folder where you stored the tutorial .BAM files
Files with checked boxes next to the file name will be imported. For this tutorial, select brain_fa, heart_fa, liver_fa, and muscle.fa
Select OK to confirm the file selection and open the next dialog (Figure 3)
Figure 3. Viewing the Sequence Import wizard; specify Output file (and directory using Browse), Species, and Genome
Configure the dialog as shown (Figure 3)
Output file provides a name for the top-level spreadsheet. Browse can be used to change the output directory.
Select Homo sapiens from the Species drop-down menu
This will allow us to select a human genome reference assembly alignment.
Select hg19 for Genome/Transcriptome reference used to align the reads
This is the reference genome our tutorial data was aligned to using Partek Flow.
Select OK to open the BAM Sample Manager dialog (Figure 4)
Figure 4. Bam Sample Manager dialog
The Bam Sample Manager dialog allows additional samples to be added or removed after the initial sample import. To remove a sample, select a sample from the list and then select Remove selected samples. This dialog also allows us to modify samples.
Select Manage samples to open the Assign files to samples dialog
Sample ID is by default set to the file name, which may be too long or uninformative, so the Assign files to samples dialog can be used to give informative names to samples.
Change the samples names to Brain, Heart, Liver, and Muscle as shown (Figure 5)
The Assign files to samples dialog also allows multiple .BAM files to be merged into one sample. This is useful if reads from one sample are split into multiple .BAM files.
Figure 5. Changing sample names using the Assign files to samples dialog
Select OK to close the Assign files to samples dialog
Select Close to exit the Bam Sample Manager dialog and view the imported data (Figure 6)
Figure 6. Viewing the imported data in a spreadsheet
Additional files can be added to this spreadsheet using the Bam Sample Manager dialog. The Bam Sample Manager dialog can also be used to add imported samples to a separate spreadsheet by selecting a new option in the dialog, Add new experiment.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Pathway is a separate program from Partek Genomics Suite with a distinct user interface (Figure 1).
Figure 1. Partek Pathway
The Project Elements panel (Figure 2) displays the selected pathway, the original gene list, the Pathway Enrichment spreadsheet, and the library references that were used for the pathway analysis. The Project Elements panel is used to navigate between open pathway diagrams and spreadsheets.
Figure 2. Project Elements panel
Select the Brain vs. Heart spreadsheet under Gene Lists
The Brain vs. Heart gene list we created earlier will open (Figure 3). The spreadsheet can be sorted by any column by left-clicking a column header; the first click will sort by ascending values, the second click will switch to descending values.
Figure 3. Viewing a gene list in Partek Pathway
Select the Pathway-Enrichment spreadsheet under Pathway Lists
The Pathway-Enrichment.txt spreadsheet will open (Figure 4). This spreadsheet has the same contents as the Pathway-Enrichment.txt spreadsheet in Partek Genomics Suite. Selecting any of the pathway names will open its pathway diagram. The spreadsheet can be sorted by any column.
Figure 4. Viewing the Pathway Enrichment spreadsheet in Partek Pathway
Select the GABAergic synapse pathway on the Pathway-Enrichment spreadsheet or the Project Elements panel
The GABAergic synapse pathway diagram will open. Genes in the pathway are shown as boxes. The color of the boxes is set by the Configuration panel (Figure 5).
Figure 5. The Configuration panel
Any numerical column from the source gene list can be used to color the gene boxes. While significant p-values indicate a difference between the categories, they give no information about upregulation or downregulation of the pathway. We can overlay fold-change information on the pathway diagram.
Select Brain vs. Heart: Fold-Change(Brain vs. Heart) from the drop-down menu
The pathway diagram now shows fold change for each gene in the pathway included in the gene list (Figure 6). Genes not in the gene list remain black.
Figure 6. GABAergic synapse pathway diagram showing fold-changes for genes in the gene list
The colors and range of can be changed using the Color By panel.
Select the red color square next to Max
Select yellow from the color picker interface
Select OK
Select the green color square next to Min
Select pale blue from the color picker interface
Select OK
We can see that all the colored genes in the GABAergic synapse pathway are yellow (Figure 7), indicating that they are up-regulated.
Figure 7. Changing colors in the Pathway Diagram; up-regulated genes are yellow and down-regulated genes are teal
We can select a gene to learn more about it.
Select () to activate Selection Mode
Right-click GABAB (Figure 8)
Figure 8. Learn more about any gene on a pathway diagram by right-clicking
Options available include:
Look up on KEGG - opens the KEGG page for the pathway on GenomeNet (genome.jp) in your web browser
Ensembl - under External Links, opens the page for the selected Ensembl ID on ensemble.org
UniProt - under External Links, opens the page for the selected UniProt ID on uniprot.org
Jump to ___ on "___" - opens the source gene list in Partek Pathway to the row of the selected gene
Selecting () activates Navigation Mode. This enables navigation on large pathway diagrams by left-clicking and dragging to move the view.
The pathway database can be searched for genes of interest using the Search panel.
Select () to open the search panel
Type NSF in the search box
Select () to search
Pathways containing NSF appear on the right-hand side in the Search Results section in alphabetical order (Figure 9).
Figure 9. Using the search panel to find pathways containing a gene of interest
If multiple species or libraries have been loaded, the Filter Options section on the left-hand side can be used to choose which species and libraries to search.
Double click on Synaptic vesicle cycle in the Search Results section
The selected pathway, Synaptic vesicle cycle, will open in the Pathway Diagram panel (Figure 10).
Select () to minimize the Search panel
Figure 10. Opening a pathway diagram from search results
On the right-hand side of the Partek Pathway window, we see the Pathway Detail panel (Figure 11).
Figure 11. The Pathway Detail panel
Select KEGG_Gene to open the list of genes in the pathway
Selecting a gene in the list will highlight it in the pathway diagram (Figure 12).
Figure 12. Selecting a gene in a pathway diagram using the Pathway Detail panel
Another way to select and view a pathway is browsing the Pathway Libraries.
Select () in the upper left-hand corner of the Partek Pathway window
The Pathway Libraries dialog will open (Figure 13).
Figure 13. Downloading and browsing pathway libraries
In the upper section of the dialog, you can view available KEGG libraries and download them by selecting the Download Library link. Selecting a pathway opens it in the lower section of the dialog.
In the lower section of the dialog, you can view a list of all the pathways in the selected pathway library. You can also open any pathway from the selected library in the Pathway Diagram panel.
Select Adherens Junction
Select View Pathway to open the pathway diagram
We can use the Project Elements panel to close an open pathway diagram or list.
Right-click Adherens Junction in the Project Elements panel
Select Delete from the pop-up menu to close the diagram (Figure 14)
Figure 14. Closing a pathway diagram
The Search panel and Pathway Libraries can also be used to open pathway diagrams for pathways without any open gene or pathway lists.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
One of the main functions of GO enrichment is to find the overrepresentation of functional categories in a gene list. With the Gene_List.txt spreadsheet selected:
From the Gene Expression workflow, choose Biological Interpretation followed by Gene Set Analysis
Select the GO Enrichment radio button in the Gene Set Analysis dialog (Figure 1) followed by Next
In the next dialog, make sure the Gene_List.txt spreadsheet is chosen from the drop-down list and click Next
Figure 1. Gene set analysis dialog. Choose GO Enrichment and select Next
You have the choice to use the Fisher's Exact or Chi-Square test. Both tests compare the proportion of a gene list in a functional group to the proportion of genes in the background for that group. Both are acceptable and you can always test both by re-running the analysis. You can also restrict the analysis to functional groups with more than or fewer than a specified number of genes. Restricting the analysis to GO groups with fewer than 150-200 genes will increase the speed of analysis and exclude large groups which may not be too informative. If analysis time is not a concern, you can just use the default settings.
Select the Use Fisher's Exact test radio button (Figure 2)
Make sure the Invoke gene ontology browser on the result check box is selected
Leave all other settings as default and click Next
Figure 2. Configure the parameters of the GO enrichment test
Select the Default mapping file radio button and click Next
Figure 3. For an explanation of the different kinds of mapping file supported, click the help icon next to each one
A new spreadsheet (Figure 4) and the gene ontology browser (Figure 5) will appear.
Figure 4. GO enrichment output spreadsheet. Right-click a row header to perform additional tasks on a chosen GO term
The new spreadsheet (GO-Enrichement.txt) is a child spreadsheet of the gene list. The first column contains the GO functional groups, each of which falls into a broader category (biological process, cellular component or molecular processes), shown in column 2. The GO functional groups are sorted by descending enrichment score, which is shown in the column 3. The enrichment score is the negative natural logarithm of the enrichment p-value, which is shown in column 4. The higher the enrichment score, the more overrepresented a functional group is in the gene list. If a functional group has an enrichment score of over 1, it is overrepresented. As a rule of thumb, an enrichment score of 3 corresponds to significant overrepresentation (p-value=0.05). For your data, you may wish to add a multiple test correction (e.g. FDR) by going to Stat > Multiple Test correction. We will not perform the multiple test correction for this tutorial.
Additional columns help describe the enrichment score for each group, including the percentage of genes in the group that are present in the gene list, the number of genes present in the group that are present in the list, and the total number of genes in the group. Because the original gene list was derived from statistical analysis, extra columns will appear for all p-values in the ANOVA model. For example, the Young/Old score and Gender score columns contain the negative natural logarithm of the geometric mean of p-values for each marker/gene present in the list and in the group. These scores represent the level of differential expression of the genes in the functional group. The larger the score, the more differentially expressed the genes are in the group. A score of 3 or greater corresponds to an average p-value of 0.05 or less. For example, the Y_oung/Old_ score explains how differentially expressed the genes present in the list and in a given group are between the "Young" and "Old" categories.
On the GO-Enrichment.txt spreadsheet, right-click on a row header of a functional group, such as hydrogen ion transmembrane transporter activity, which has an enrichment score of 29.9484, and choose Browse to GO term from the menu
Figure 5. Viewing a functional group on the gene ontology browser
The Gene Ontology Browser opens in a separate tab (Figure 5). A functional group viewed in the browser will show the hierarchical relationship to the other GO terms. The selected functional group will be highlighted on the left. In Figure 5, you can see the hydrogen ion transmembrane transporter activity is found in the tree molecular function > transporter activity > substrate-specific transmembrane transporter activity > cation-transporting ATPase activity > inorganic cation transmembrane transporter activity > monovalent inorganic cation transmembrane activity. On the right, a bar chart shows the sub-groups of the selected group and their respective enrichment scores.
On the GO-Enrichment.txt spreadsheet, right-click on a row header of a functional group and choose Term Details from the menu
A web browser will be opened and you will be re-directed to the AmiGO website, where you will find more details about the chosen GO term (internet connection required).
On the GO-Enrichment.txt spreadsheet, right-click on a row header of a functional group and choose Create Gene List from the menu
A new spreadsheet (gene-list) that contains the genes in the original list that belong to the chosen functional group will be created (Figure 6). Note that this spreadsheet is a Partek temporary (ptmp) file. To save it, click the Save icon ().
Figure 6. New gene list spreadsheet containing all the genes in the original list that belong to the chosen functional group
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Partek Genomics Suite supports different types of mapping files (Figure 3). These are library files that define how genes are organized into functional groups. For an explanation of each type of mapping file, click on the help icon () next to each one.