1 of 9

Gene Expression Analysis with Batch Effects

This tutorial will will illustrate:

Importing the data set
Adding an annotation link
Exploring the data set with PCA

Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

Description of the Data Set

The data for this tutorial is taken from an experiment that examined the effects of four treatment conditions at two time points on estrogen receptor-positive breast cancer cell lines in vitro. Each treatment/time combination has two replicates and there are two control samples for a total of eighteen samples. Gene expression analysis was performed using the Affymetrix GeneChip_®_ Human U95A array. Values are transformed to log base 2 scale by f(x) = log2(x+1).

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Importing the data set

The original experiment is listed on the Gene Expression Omnibus as GSE848; however, this tutorial only uses a subset of the original experiment and should be downloaded from the Partek website tutorial page, Gene Expression Analysis with Batch Effects.

Download the zipped project folder, Breast_Cancer-GE.zip
Unzip the project folder to C:/Partek Training Data/ or a directory of your choosing

This location should be easily accessible. The unzipped Breast_Cancer-GE project folder and a zipped annotation file will be added to the selected directory.

Unzip the included annotation file, HG_U95Av2.na32.annot.rar
Move the annotation file, HG_U95Av2.na32.annot, to the microarray libraries folder

By default, the microarray libraries folder will be located at C:/Microarray Libraries, but the location may vary depending on your operating system and configuration.

Open Partek Genomics Suite
Select () from the main command bar
Navigate to the tutorial folder, Breast_Cancer-GE

Figure 1. Opening a data file. The red Partek Genomics Suite icon is shown next to the data file (FMT file format)

The spreadsheet will open as 1 (Breast_Cancer.txt) (Figure 2).

Figure 2. Breast_Cancer.txt data file

The summary at the bottom the spreadsheet shows there are 18 rows and 12,631 columns in the spreadsheet. The first column contains the Filename listing the GEO GSM number. This is also is an identifier for the microarray. Treatment, Time, and Batch are in columns 2, 3, and 4, respectively. Column 6 marks the beginning of the probesets. The data is log2 transformed.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Adding an annotation link

While many types of data sets are automatically linked with appropriate annotation files upon import, if this does not occur, a spreadsheet can be manually linked with an annotation file.

Right-click Breast_Cancer.txt in the spreadsheet tree
Select Properties (Figure 1)

Figure 1. Selecting file properties for a spreadsheet

Configure the Configure Genomic Properties as shown (Figure 2) with the following steps:

Select Gene Expression from the Choose the type of genomic data drop-down menu
Select Feature in column label
Select Browse...

Figure 2. Configure the genomic properties dialog as shown

There is now an * after the spreadsheet name in the spreadsheet tree. This indicates an unsaved change has been made to the spreadsheet.

Select () to save the changes

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Exploring the data set with PCA

Principal Components Analysis (PCA) is an excellent method to visualize similarities and differences between the samples in a data set. PCA can be invoked through a workflow, by selecting () from the main command bar, or by selecting Scatter Plot from the View section of the main toolbar. We will use a workflow.

Select Gene Expression from the Workflows drop-down menu
Select PCA Scatter Plot from the QA/QC section of the Gene Expression

Detect differentially expressed genes with ANOVA

Analysis of variance (ANOVA) is a very powerful technique for identifying differentially expressed genes in a multi-factor experiment. In this data set, ANOVA will be used to generate a list of genes that are significantly differentially regulated by each treatment.

Adding factors and interactions

When setting up the ANOVA, the primary factors of interest, Treatment and Time, should be included. We will also include the interaction between Treatment and Time, Treatment * Time, because we are interested in whether different treatments behave differently over time. From our exploratory analysis using PCA, we also know that Batch

Removing batch effects

By including Batch in the ANOVA model, the variability due to the batch effect is accounted for when calculating p-values for the non-random factors. In this sense, the batch effect has already been removed. However, visualizing biological effects can be very difficult if batch effects are present in the original intensity data used to generate visualizations. We can modify the original intensity data to remove the batch effect using the Remove Batch Effect tool.

Using the Remove Batch Effect tool

The Remove Batch Effect tool functions much like ANOVA in reverse, calculating the variation attributed to the factor being removed then adjusting the original intensity values to remove the variation. Once the variation caused by the batch effect has been removed, tools like PCA or clustering can be used to visualize what the data would look like if the batch effect was not present.

Select the1 (Breast_Cancer.txt) spreadsheet
Select Stat from the main tool bar
Select Remove Batch Effect... (Figure 1)

Figure 1. Invoking the Remove Batch Effect tool

The Remove Batch Effects dialog will open. The tool functions by performing an ANOVA then modifying the original intensities values to remove the effects of the specified factor(s).

Select Treatment, Time, and Batch
Select Add Factor > to add them to the ANOVA Factor(s) panel
Select Batch in the ANOVA Factor(s) panel

By default, the results will be displayed in a new spreadsheet. Options to overwrite the current spreadsheet and specify the output file appear in the bottom of the dialog (Figure 2).

Figure 2. Configuring the Remove Batch Effects tool to remove Batch and create a new spreadsheet

Select OK

The new spreadsheet, 1-removeresult (batch-remove) will open in the Analysis tab (Figure 3).

Figure 3. Viewing the new spreadsheet with batch effects removed

Batch effects in PCA

We can visualize the effects of removing the batch effects using PCA.

Select 1 (Breast_Cancer.txt) from the spreadsheet tree
Select () plot the PCA scatter plot
Select ()

Figure 4. Adding a centroid for Batch

Select OK to close the Add Centroid...
Select OK to close the Configure Plot Properties dialog

The two centroids are distinct, showing the batch effect (Figure 5).

Figure 5. Viewing a batch effect using PCA. The batches are shown as the pink (A) and yellow (B) centroids. The clear separation of the centroids indicates a batch effect

Repeat the above steps for 1-removeresult (batch-remove)

For 1-removeresult (batch-remove), the centroids of the two batches overlap, showing that the batch effect has been removed (Figure 6).

Figure 6. Overlapping centroids for batches A and B show that the batch effect has been removed.

Batch effects in ANOVA results visualizations

Visualization of ANOVA results for single probe(sets)/genes also benefits from batch removal. To illustrate this, we first need to repeat our ANOVA using the new batch-remove intesitiy values spreadsheet.

Select the Analysis tab
Select 1-removeresult (batch-remove) in the spreadsheet tree
Select Stat from the main toolbar

Figure 7. Configuring ANOVA to comparing treatment groups to control

Select OK to add contrasts
Change output file name to ANOVAResults_batch-remove
Select OK to perform the ANOVA

The ANOVAResults_batch-remove spreadsheet will open in the Analysis tab.

Select the ANOVAResults spreadsheet
Right-click on the row header for row 2, TFF1
Select Dot Plot (Orig. Data) (Figure 8)

Figure 8. Invoking a dot plot from the ANOVAResults spreadsheet

A dot plot for trefoil factor 1 (TFF1) will open (Figure 9). The dot plot shows gene intensity values (y-axis) for each sample. Samples are grouped by Treatment.

Figure 9. Viewing the dot plot for trefoil factor 1 (TIFF1) across different treatment groups

To visualize the batch effect we will make a few changes to the plot.

Select H/V to switch the horizontal and vertical axis
Select ()
Set Color to Batch

Figure 10. Configuring the dot plot (part 1 of 2)

Select the Labels tab
Select Column for In Point Labels
Select Time from the Column drop-down list (Figure 11)

Figure 11. Configuring the dot plot (part 2 of 2)

Select OK

The dot plot now clearly shows the batch effect (Figure 12). Samples within treatment groups are separated clearly between the two batches shown in blue and red.

Figure 12. Viewing a dot plot showing a batch effect. Each dot is a sample. The y-axis is treatment combinations; the x-axis is the expression value of the TFF1 gene. Dots are colored by batch, sized by time, connected by treatment combination, and labeled by time.

To view the effects of batch removal, we can view this dot plot for the ANOVAResults_batch-remove spreadsheet.

Select the Analysis tab
Select ANOVA-3way (ANOVAResults_batch-remove) from the spreadsheet tree
Repeat the steps shown above to create the dot plot for trefoil factor 1

The dot plot invoked from the ANOVAResults_batch-remove) spreadsheet shows that the batch effect has been removed as all the samples no longer clearly separate by color within treatment groups (Figure 13).

Figure 13. Viewing the dot plot that shows batch effect removal. The plot configuration matches Figure 12.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Creating a gene list using the Venn Diagram

The List Manager can be used to generate lists of genes by applying criteria such as fold change and false discovery rate (FDR) adjusted p-value thresholds.

Select the Analysis tab
Select ANOVAResults in the spreadsheet tree
Select Create Gene List from the Analysis section of the Gene Expression workflow (Figure 1)

Figure 1. Selecting Create Gene List from the Gene Expression workflow

Select E2 vs. Control from the Contrast panel of the ANOVA Streamlined tab in the List Manager dialog
Deselect the Include size of the change option
Set p-value with FDR < to 0.1 (Figure 2)

Figure 2. Configuring the List Manager using the ANOVA Streamlined filtering options

There should be ~545 probe(sets)/genes that meet this threshold.

Select Create

A new spreadsheet, E2 vs. Control, will be added as a child spreadsheet of Breast_Cancer.txt.

Repeat the steps listed above to create lists for E2+ICI vs. Control (~24 genes), E2+Ral vs. Control (~22 genes), and E2+TOT vs. Control (~177 genes) with the same threashold

Now we can use the Venn Diagram to create a list of genes that are differentially regulated in all treatment groups.

Select the Venn Diagram tab in the List Manager dialog

The Venn Diagram shows overlap between selected gene lists.

Select the four created lists (E-H) in the spreadsheet list in the List Manager dialog by selecting each while holding the Ctrl key on your keyboard

The Venn Diagram will display the number of overlapping and distinct genes from the four lists (Figure 3).

Figure 3. Viewing the Venn Diagram with intersections of four lists of significant genes

The intersection of the four ellipses shows that 14 differentially regulated genes are in common between the four threatment schemes.

Select the region intersecting all four ellipses
Right-click the intersected region
Select Create List From Highlighted Regions

The new list will appear in the spreadsheet tree with a temporary file name (ptpm).

Select the temporary list in the spreadsheet tree
Select () from the command bar
Save the list as fourtreatments

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Hierarchical clustering using a gene list

Opening a gene list as a child spreadsheet

Gene lists can be visualized and their ability to distinguish samples evaluated using a hierarchical clustering heat map. Because of the batch effect in this data set, we will perform hierarchical clustering using batch-corrected intensity values. To do this, we need to open the fourtreatments list of differentially expressed genes as a child spreadsheet of the batch-remove spreadsheet

Select fourtreatments from the spreadsheet tree
Select () to close the spreadsheet
Select 1-removeresult (batch-remove) from the spreadsheet tree
Select File from the main tool bar
Select Open as child...
Select fourtreatments using the file browser

The fourtreatments spreadsheet will open as a child spreadsheet of batch-remove (Figure 1).

Figure 1. The fourtreatments spreadsheet is open as a child spreadsheet of bath-remove. Visualizations performed using fourtreatments will pull intensity values from batch-remove.

Visualizations performed using the fourtreatments spreadsheet will now use intensity values from the batch-remove spreadsheet.

Hierarchical clustering using a gene list

To invoke hierarchical clustering, follow the steps below.

Select Cluster Based on Significant Genes from the Visualization section of the Gene Expression workflow
Select Hierarchical Clustering
Select OK

Figure 2. Configuring the Cluster the significant genes dialog

Select OK

The hiearchical clustering heat map will open in a new tab (Figure 3).

Figure 3. Hierarchical clustering of genes with significantly different expression across the treatment groups

Genes without changes in expression are given a value of zero and are colored black. Up-regulated genes have positive values and are displayed in red. Down-regulated genes have negative values and are displayed in green. Each sample is represented in a row while genes are represented as columns. Dendrograms illustrate clustering of samples and genes. To learn more about configuring the hierarchical clustering heat map, see the user guide.

For detailed information about the methods used for clustering, refer to the Partek Manual Chapter 8: Hierarchical & Partitioning Clustering.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

GO enrichment using a gene list

Gene Ontology (GO) enrichment analysis compares a gene list to lists of genes associated with biological processes, cellular compartments, and molecular functions to provide biological insights. Once a list of genes has been created, it is possible to see which GO terms the genes are associated with and whether any GO terms are significantly enriched in the gene list.

Select the E2 vs. Control spreadsheet from the spreadsheet tree
Select Gene Set Analysis from the Biological Interpretation section of the Gene Expression workflow
Select Next > to continue with GO Enrichment
Select Next > to continue with 1/E2_vs_Control (E2 vs. Control)
Select Next > to continue with default parameter settings
Select Next > to continue with the default mapping file

A new spreadsheet 1 (GO-Enrichment.txt) will open as a child spreadsheet of E2 vs. Control (Figure 1).

Figure 1. GO Enrichment results spreadsheet

GO terms are shown in rows and are sorted by ascending enrichment p-value.

To visualize the results, we can launch the Gene Ontology Browser.

Select View from the main tool bar
Select Gene Ontology Browser

The Gene Ontology Browser will open in a new tab (Figure 2).

Figure 2. Viewing GO enrichment results in the Gene Ontology Browser

The bar chart shows the GO terms with the highest enrichment scores for the gene list.

To learn more about GO enrichment and using the Gene Ontology Browser, please consult the tutorial.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Removing batch effects

Using the Remove Batch Effect tool

Select the1 (Breast_Cancer.txt) spreadsheet
Select Stat from the main tool bar
Select Remove Batch Effect... (Figure 1)

Figure 1. Invoking the Remove Batch Effect tool

The Remove Batch Effects dialog will open. The tool functions by performing an ANOVA then modifying the original intensities values to remove the effects of the specified factor(s).

Select Treatment, Time, and Batch
Select Add Factor > to add them to the ANOVA Factor(s) panel
Select Batch in the ANOVA Factor(s) panel

By default, the results will be displayed in a new spreadsheet. Options to overwrite the current spreadsheet and specify the output file appear in the bottom of the dialog (Figure 2).

Figure 2. Configuring the Remove Batch Effects tool to remove Batch and create a new spreadsheet

Select OK

The new spreadsheet, 1-removeresult (batch-remove) will open in the Analysis tab (Figure 3).

Figure 3. Viewing the new spreadsheet with batch effects removed

Batch effects in PCA

We can visualize the effects of removing the batch effects using PCA.

Select 1 (Breast_Cancer.txt) from the spreadsheet tree
Select () plot the PCA scatter plot
Select ()

Figure 4. Adding a centroid for Batch

Select OK to close the Add Centroid...
Select OK to close the Configure Plot Properties dialog

The two centroids are distinct, showing the batch effect (Figure 5).

Figure 5. Viewing a batch effect using PCA. The batches are shown as the pink (A) and yellow (B) centroids. The clear separation of the centroids indicates a batch effect

Repeat the above steps for 1-removeresult (batch-remove)

For 1-removeresult (batch-remove), the centroids of the two batches overlap, showing that the batch effect has been removed (Figure 6).

Figure 6. Overlapping centroids for batches A and B show that the batch effect has been removed.

Batch effects in ANOVA results visualizations

Select the Analysis tab
Select 1-removeresult (batch-remove) in the spreadsheet tree
Select Stat from the main toolbar

Figure 7. Configuring ANOVA to comparing treatment groups to control

Select OK to add contrasts
Change output file name to ANOVAResults_batch-remove
Select OK to perform the ANOVA

The ANOVAResults_batch-remove spreadsheet will open in the Analysis tab.

Select the ANOVAResults spreadsheet
Right-click on the row header for row 2, TFF1
Select Dot Plot (Orig. Data) (Figure 8)

Figure 8. Invoking a dot plot from the ANOVAResults spreadsheet

A dot plot for trefoil factor 1 (TFF1) will open (Figure 9). The dot plot shows gene intensity values (y-axis) for each sample. Samples are grouped by Treatment.

Figure 9. Viewing the dot plot for trefoil factor 1 (TIFF1) across different treatment groups

To visualize the batch effect we will make a few changes to the plot.

Select H/V to switch the horizontal and vertical axis
Select ()
Set Color to Batch

Figure 10. Configuring the dot plot (part 1 of 2)

Select the Labels tab
Select Column for In Point Labels
Select Time from the Column drop-down list (Figure 11)

Figure 11. Configuring the dot plot (part 2 of 2)

Select OK

The dot plot now clearly shows the batch effect (Figure 12). Samples within treatment groups are separated clearly between the two batches shown in blue and red.

To view the effects of batch removal, we can view this dot plot for the ANOVAResults_batch-remove spreadsheet.