Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The original experiment is listed on the Gene Expression Omnibus as GSE848; however, this tutorial only uses a subset of the original experiment and should be downloaded from the Partek website tutorial page, Gene Expression Analysis with Batch Effects.
Download the zipped project folder, Breast_Cancer-GE.zip
Unzip the project folder to C:/Partek Training Data/ or a directory of your choosing
This location should be easily accessible. The unzipped Breast_Cancer-GE project folder and a zipped annotation file will be added to the selected directory.
Unzip the included annotation file, HG_U95Av2.na32.annot.rar
Move the annotation file, HG_U95Av2.na32.annot, to the microarray libraries folder
By default, the microarray libraries folder will be located at C:/Microarray Libraries, but the location may vary depending on your operating system and configuration.
Open Partek Genomics Suite
Select () from the main command bar
Navigate to the tutorial folder, Breast_Cancer-GE
Select Breast_Cancer.txt
Select Open (Figure 1)
Figure 1. Opening a data file. The red Partek Genomics Suite icon is shown next to the data file (FMT file format)
The spreadsheet will open as 1 (Breast_Cancer.txt) (Figure 2).
Figure 2. Breast_Cancer.txt data file
The summary at the bottom the spreadsheet shows there are 18 rows and 12,631 columns in the spreadsheet. The first column contains the Filename listing the GEO GSM number. This is also is an identifier for the microarray. Treatment, Time, and Batch are in columns 2, 3, and 4, respectively. Column 6 marks the beginning of the probesets. The data is log2 transformed.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Gene lists can be visualized and their ability to distinguish samples evaluated using a hierarchical clustering heat map. Because of the batch effect in this data set, we will perform hierarchical clustering using batch-corrected intensity values. To do this, we need to open the fourtreatments list of differentially expressed genes as a child spreadsheet of the batch-remove spreadsheet
Select fourtreatments from the spreadsheet tree
Select () to close the spreadsheet
Select 1-removeresult (batch-remove) from the spreadsheet tree
Select File from the main tool bar
Select Open as child...
Select fourtreatments using the file browser
The fourtreatments spreadsheet will open as a child spreadsheet of batch-remove (Figure 1).
Figure 1. The fourtreatments spreadsheet is open as a child spreadsheet of bath-remove. Visualizations performed using fourtreatments will pull intensity values from batch-remove.
Visualizations performed using the fourtreatments spreadsheet will now use intensity values from the batch-remove spreadsheet.
To invoke hierarchical clustering, follow the steps below.
Select Cluster Based on Significant Genes from the Visualization section of the Gene Expression workflow
Select Hierarchical Clustering
Select OK
Select 1-removeresult/1 (fourtreatments) from the drop-down menu
Select Standardize for Expression normalization (Figure 2)
Figure 2. Configuring the Cluster the significant genes dialog
Select OK
The hiearchical clustering heat map will open in a new tab (Figure 3).
Figure 3. Hierarchical clustering of genes with significantly different expression across the treatment groups
Genes without changes in expression are given a value of zero and are colored black. Up-regulated genes have positive values and are displayed in red. Down-regulated genes have negative values and are displayed in green. Each sample is represented in a row while genes are represented as columns. Dendrograms illustrate clustering of samples and genes. To learn more about configuring the hierarchical clustering heat map, see the Hierarchical Clustering Analysis user guide.
For detailed information about the methods used for clustering, refer to the Partek Manual Chapter 8: Hierarchical & Partitioning Clustering.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This tutorial will will illustrate:
Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on Our support page to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.
The data for this tutorial is taken from an experiment that examined the effects of four treatment conditions at two time points on estrogen receptor-positive breast cancer cell lines in vitro. Each treatment/time combination has two replicates and there are two control samples for a total of eighteen samples. Gene expression analysis was performed using the Affymetrix GeneChip_®_ Human U95A array. Values are transformed to log base 2 scale by f(x) = log2(x+1).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Principal Components Analysis (PCA) is an excellent method to visualize similarities and differences between the samples in a data set. PCA can be invoked through a workflow, by selecting () from the main command bar, or by selecting Scatter Plot from the View section of the main toolbar. We will use a workflow.
Select Gene Expression from the Workflows drop-down menu
Select PCA Scatter Plot from the QA/QC section of the Gene Expression workflow
The PCA scatter plot will open as a new tab (Figure 1).
Figure 1. Viewing the PCA scatter plot. Each point is a sample. Samples are colored by treatment.
In this PCA scatter plot, each point represents a sample in the spreadsheet. Points that are close together in the plot are more similar, while points that are far apart in the plot are more dissimilar.
To better view the data, we can rotate the plot.
Click and drag to rotate the plot
Rotating the plot allows us to look for outliers in the data on each of the three principal components (PC1-3). The percentage of the total variation explained by each PC is listed by its axis label. The chart label shows the sum percentage of the total variation explained by the displayed PCs.
We can change the plot properties to better visualize the effects of different variables.
Set Shape to 4. Batch
Set Size to 3. Time
Set Connect to 5. Treatment Combination
Select OK (Figure 2)
Figure 2. Configuring plot properties to color by treatment, shape by batch, size by time, and connect by treatment combination
The PCA scatter plot now shows information about treament, batch, and time for each sample (Figure 3).
Figure 3. PCA scatter plot showing treatment, batch, and time information for each sample. A batch effect is clearly visible.
PCA is particularly useful for identifying outliers and batch effects in data sets. We can see a batch effect in this dataset as samples separate by batch. To make this more clear, we can add an ellipses by Batch.
Select Ellipsoids from the tab
Select Add Ellipse/Ellipsoid
Select Ellipse
Select Batch from the Categorical Vairable(s) panel and move it to the Group Variable(s) panel
Select OK
Select OK to close the dialog
The ellipses help illustrate that the data is spearated by batches (Figure 4).
Figure 4. Ellipses around batch groups show that samples separate by batch
Ways to address the batch effect in the data set will be detailed later in this tutorial.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Select () to activate Rotate Mode
Select () to open the Configure__Plot Properties dialog
Select () to open the Configure__Plot Properties dialog
Analysis of variance (ANOVA) is a very powerful technique for identifying differentially expressed genes in a multi-factor experiment. In this data set, ANOVA will be used to generate a list of genes that are significantly differentially regulated by each treatment.
When setting up the ANOVA, the primary factors of interest, Treatment and Time, should be included. We will also include the interaction between Treatment and Time, Treatment * Time, because we are interested in whether different treatments behave differently over time. From our exploratory analysis using PCA, we also know that Batch is a major source of variation and needs to be included. Including Batch as a random factor will allow us to account for the batch effect.
Select Detect differentially expressed genes from the Analysis section of the Gene Expression workflow
Select Treatment, Time, and Batch in the Experimental Factor(s) panel
Select Add Factor > to move the selections to the ANOVA Factor(s) panel
Select both Treatment and Time in the Experimental Factor(s) panel by holding Ctrl on the keyboard while selecting each
Select Add Interaction > to add the Treatment * Time interaction to the ANOVA Factor(s) panel (Figure 1)
Do not select OK or Apply. We still need to add linear contrasts to the ANOVA model
Figure 1. Adding factors and interactions to the ANOVA
ANOVA will output a p-value and F ratio for each factor or interaction; to get the fold-change and ratio between the different levels of a factor or interaction, linear contrasts, or comparisons, must be added.
Select Contrasts... in the ANOVA dialog (Figure 1)
Select Yes for Data is already log transformed?
Select Treatment * Time from the Select Factor/Interaction drop-down menu
We will add contrasts comparing each of the three treatment groups to the control group.
Select E2 * 8 and E2 * 48 from the Candidate Level(s) panel
Select Add Contrast Level > to move them to the top panel (Group 1) on the right-hand side
The Group 1 panel will be renamed after the contents of the panel. We can specify a name for the group.
Set Label of the top panel to E2
Select Control * 0 from the Candidate Level(s) panel
Select Add Contrast Level > to move it to the bottom panel (Group 2) on the right-hand side
Set Label of the bottom panel to Control
The lower panel (Group 2) is considered the reference level. Because the data is log2 transformed, the geometric mean will be used to calculate the fold change and mean ratio to place both on a linear scale instead of a log scale.
Select **Add Contrast (**Figure 2)
Figure 2. Adding a contrast between E2 vs. Control at all time points.
To examine the time points of each treatment condition separately, we can select Add Combinations instead of Add Contrast. This adds every possible contrast for the levels in the Group 1 and Group 2 panels.
Select E2 * 8 and E2 * 48 from the Candidate Level(s) panel
Select Add Contrast Level > to move them to the top panel (Group 1) on the right-hand side
Select Control * 0 from the Candidate Level(s) panel
Select Add Contrast Level > to move it to the bottom panel (Group 2) on the right-hand side
Select Add Combinations to add contrasts for E2 * 8 vs. Control * 0 and E2 * 48 vs. Control * 0 (Figure 3)
Figure 3. Add Combinations creates contrasts for every combination of levels from the two group panels.
For this tutorial, we will not be considering the time points of each treatment condition individually. We can remove the E2 * 8 vs. Control * 0 and E2 * 48 vs. Control * 0 contrasts.
Select E2 * 8 vs. Control * 0 and E2 * 48 vs. Control * 0 from the contrasts list
Select Delete
We will now add contrasts for the other treatment conditions.
Add contrasts for E2+ICI vs. Control, E2+Ral vs. Control, and E2+TOT vs. Control following the steps outlined for E2 vs. Control
There should now be four contrasts added to the contrasts panel (Figure 4).
Figure 4. Fully configured contrasts for the tutorial
Select OK to add the contrasts to the ANVOA model
The Contrasts... button should now read Contrasts Included in the ANOVA dialog.
Select OK to perform the ANOVA
The result of the 3-way mixed model ANOVA is displayed in a new spreadsheet, ANOVA-3way (ANOVAResults) that is a child of the Breast_Cancer.txt spreadsheet. In ANOVAResults, each row represents a probe(set)/gene with the columns containing the results of the ANOVA (Figure 5).
Figure 5. Viewing the ANOVA Results spreadsheet. Probe(sets)/genes are on rows and the ANOVA results are on columns.
By default, the rows are sorted in acending order by the p-value of the first factor, which places the most significantly differentially expressed gene between different treatments at the top of the spreadsheet.
Each factor in the ANOVA adds p-value, F value, and SS value columns. F value is a ratio of signal to noise; high values indicate that the probe(set)/gene explains variation in the data set due to the factor. SS value is the sum of squares.
Each contrast in the ANOVA adds p-value, ratio, and fold-change columns. The p-value is calculated using log space. The ratio and fold change are calculated using linear space.
Sources of variation captured in the ANOVA can be viewed for the entire data set or for individual probe(sets)/genes.
Select View Sources of Variation from the Analysis section of the Gene Expression workflow
The Sources of Variation plot will open in a new tab (Figure 6).
Figure 6. Viewing the sources of variation plot. Non-random factors are included when ANOVA is run using the default REML modle.
This plot presents the signal to noise ratio accross all probe(sets)/genes for each of the non-random factors and interactions in the ANOVA model. The y-axis represents the average mean square or F ratio, the ANOVA measure of variance, for all the probesets. Each bar is a factor and random error is also included. If the factor has a greater mean F ratio than Error, the factor contrinbuted significant variation to the data set.
Note that Batch is not included as a factor. This is beacuse Batch is a random factor and accounted for by the ANOVA model.
The sources of variation for each probe(set)/gene can be viewed individually.
Right-click on a row header in the ANOVAResults spreadsheet
Select Sources of Variation from the pop-up menu
The plot will open in a new tab. For additional plots that can be invoked from the ANOVA results spreadsheet, see the Visualizations user guide.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
By including Batch in the ANOVA model, the variability due to the batch effect is accounted for when calculating p-values for the non-random factors. In this sense, the batch effect has already been removed. However, visualizing biological effects can be very difficult if batch effects are present in the original intensity data used to generate visualizations. We can modify the original intensity data to remove the batch effect using the Remove Batch Effect tool.
The Remove Batch Effect tool functions much like ANOVA in reverse, calculating the variation attributed to the factor being removed then adjusting the original intensity values to remove the variation. Once the variation caused by the batch effect has been removed, tools like PCA or clustering can be used to visualize what the data would look like if the batch effect was not present.
Select the1 (Breast_Cancer.txt) spreadsheet
Select Stat from the main tool bar
Select Remove Batch Effect... (Figure 1)
Figure 1. Invoking the Remove Batch Effect tool
The Remove Batch Effects dialog will open. The tool functions by performing an ANOVA then modifying the original intensities values to remove the effects of the specified factor(s).
Select Treatment, Time, and Batch
Select Add Factor > to add them to the ANOVA Factor(s) panel
Select Batch in the ANOVA Factor(s) panel
Select Add Factor > to add Batch to the Remove Effect(s) of These Factor(s) panel
By default, the results will be displayed in a new spreadsheet. Options to overwrite the current spreadsheet and specify the output file appear in the bottom of the dialog (Figure 2).
Figure 2. Configuring the Remove Batch Effects tool to remove Batch and create a new spreadsheet
Select OK
The new spreadsheet, 1-removeresult (batch-remove) will open in the Analysis tab (Figure 3).
Figure 3. Viewing the new spreadsheet with batch effects removed
We can visualize the effects of removing the batch effects using PCA.
Select 1 (Breast_Cancer.txt) from the spreadsheet tree
Set Drawing Mode to Mixed
Select the Ellipsoids tab
Select Add Centroid
Add Batch to the Grouping Variable(s) panel
Set the colors of the two centroids as shown (Figure 4) to pink and yellow
Figure 4. Adding a centroid for Batch
Select OK to close the Add Centroid...
Select OK to close the Configure Plot Properties dialog
The two centroids are distinct, showing the batch effect (Figure 5).
Figure 5. Viewing a batch effect using PCA. The batches are shown as the pink (A) and yellow (B) centroids. The clear separation of the centroids indicates a batch effect
Repeat the above steps for 1-removeresult (batch-remove)
For 1-removeresult (batch-remove), the centroids of the two batches overlap, showing that the batch effect has been removed (Figure 6).
Figure 6. Overlapping centroids for batches A and B show that the batch effect has been removed.
Visualization of ANOVA results for single probe(sets)/genes also benefits from batch removal. To illustrate this, we first need to repeat our ANOVA using the new batch-remove intesitiy values spreadsheet.
Select the Analysis tab
Select 1-removeresult (batch-remove) in the spreadsheet tree
Select Stat from the main toolbar
Select ANOVA...
Add Treatment, Time, and Batch factors to the ANOVA Factor(s) panel
Add Treatment * Time interaction to the ANOVA Factor(s) panel
Select Contrasts...
Select Treatment from the Select Factor/Interaction drop-down menu
Select Yes for Data is already log transformed?
Set up contrasts of treatment vs. control for E2, E2+ICI, E2+Ral, and E2+TOT (Figure 7)
Figure 7. Configuring ANOVA to comparing treatment groups to control
Select OK to add contrasts
Change output file name to ANOVAResults_batch-remove
Select OK to perform the ANOVA
The ANOVAResults_batch-remove spreadsheet will open in the Analysis tab.
Select the ANOVAResults spreadsheet
Right-click on the row header for row 2, TFF1
Select Dot Plot (Orig. Data) (Figure 8)
Figure 8. Invoking a dot plot from the ANOVAResults spreadsheet
A dot plot for trefoil factor 1 (TFF1) will open (Figure 9). The dot plot shows gene intensity values (y-axis) for each sample. Samples are grouped by Treatment.
Figure 9. Viewing the dot plot for trefoil factor 1 (TIFF1) across different treatment groups
To visualize the batch effect we will make a few changes to the plot.
Select H/V to switch the horizontal and vertical axis
Set Color to Batch
Set Size to Time
Set Connect to Treatment Combination (Figure 10)
Figure 10. Configuring the dot plot (part 1 of 2)
Select the Labels tab
Select Column for In Point Labels
Select Time from the Column drop-down list (Figure 11)
Figure 11. Configuring the dot plot (part 2 of 2)
Select OK
The dot plot now clearly shows the batch effect (Figure 12). Samples within treatment groups are separated clearly between the two batches shown in blue and red.
Figure 12. Viewing a dot plot showing a batch effect. Each dot is a sample. The y-axis is treatment combinations; the x-axis is the expression value of the TFF1 gene. Dots are colored by batch, sized by time, connected by treatment combination, and labeled by time.
To view the effects of batch removal, we can view this dot plot for the ANOVAResults_batch-remove spreadsheet.
Select the Analysis tab
Select ANOVA-3way (ANOVAResults_batch-remove) from the spreadsheet tree
Repeat the steps shown above to create the dot plot for trefoil factor 1
The dot plot invoked from the ANOVAResults_batch-remove) spreadsheet shows that the batch effect has been removed as all the samples no longer clearly separate by color within treatment groups (Figure 13).
Figure 13. Viewing the dot plot that shows batch effect removal. The plot configuration matches Figure 12.
The List Manager can be used to generate lists of genes by applying criteria such as fold change and false discovery rate (FDR) adjusted p-value thresholds.
Select the Analysis tab
Select ANOVAResults in the spreadsheet tree
Select Create Gene List from the Analysis section of the Gene Expression workflow (Figure 1)
Figure 1. Selecting Create Gene List from the Gene Expression workflow
Select E2 vs. Control from the Contrast panel of the ANOVA Streamlined tab in the List Manager dialog
Deselect the Include size of the change option
Set p-value with FDR < to 0.1 (Figure 2)
Figure 2. Configuring the List Manager using the ANOVA Streamlined filtering options
There should be ~545 probe(sets)/genes that meet this threshold.
Select Create
A new spreadsheet, E2 vs. Control, will be added as a child spreadsheet of Breast_Cancer.txt.
Repeat the steps listed above to create lists for E2+ICI vs. Control (~24 genes), E2+Ral vs. Control (~22 genes), and E2+TOT vs. Control (~177 genes) with the same threashold
Now we can use the Venn Diagram to create a list of genes that are differentially regulated in all treatment groups.
Select the Venn Diagram tab in the List Manager dialog
The Venn Diagram shows overlap between selected gene lists.
Select the four created lists (E-H) in the spreadsheet list in the List Manager dialog by selecting each while holding the Ctrl key on your keyboard
The Venn Diagram will display the number of overlapping and distinct genes from the four lists (Figure 3).
Figure 3. Viewing the Venn Diagram with intersections of four lists of significant genes
The intersection of the four ellipses shows that 14 differentially regulated genes are in common between the four threatment schemes.
Select the region intersecting all four ellipses
Right-click the intersected region
Select Create List From Highlighted Regions
Select Close to exit the List Manager dialog
The new list will appear in the spreadsheet tree with a temporary file name (ptpm).
Select the temporary list in the spreadsheet tree
Save the list as fourtreatments
Select () plot the PCA scatter plot
Select ()
Select ()
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () from the command bar
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Gene Ontology (GO) enrichment analysis compares a gene list to lists of genes associated with biological processes, cellular compartments, and molecular functions to provide biological insights. Once a list of genes has been created, it is possible to see which GO terms the genes are associated with and whether any GO terms are significantly enriched in the gene list.
Select the E2 vs. Control spreadsheet from the spreadsheet tree
Select Gene Set Analysis from the Biological Interpretation section of the Gene Expression workflow
Select Next > to continue with GO Enrichment
Select Next > to continue with 1/E2_vs_Control (E2 vs. Control)
Select Next > to continue with default parameter settings
Select Next > to continue with the default mapping file
A new spreadsheet 1 (GO-Enrichment.txt) will open as a child spreadsheet of E2 vs. Control (Figure 1).
Figure 1. GO Enrichment results spreadsheet
GO terms are shown in rows and are sorted by ascending enrichment p-value.
To visualize the results, we can launch the Gene Ontology Browser.
Select View from the main tool bar
Select Gene Ontology Browser
The Gene Ontology Browser will open in a new tab (Figure 2).
Figure 2. Viewing GO enrichment results in the Gene Ontology Browser
The bar chart shows the GO terms with the highest enrichment scores for the gene list.
To learn more about GO enrichment and using the Gene Ontology Browser, please consult the Gene Ontology Enrichment tutorial.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
While many types of data sets are automatically linked with appropriate annotation files upon import, if this does not occur, a spreadsheet can be manually linked with an annotation file.
Right-click Breast_Cancer.txt in the spreadsheet tree
Select Properties (Figure 1)
Figure 1. Selecting file properties for a spreadsheet
Configure the Configure Genomic Properties as shown (Figure 2) with the following steps:
Select Gene Expression from the Choose the type of genomic data drop-down menu
Select Feature in column label
Select Browse...
Select HG_U95Av2.na36.annot.csv from the microarray libraries folder
Select Set Column
Select Gene Symbol from the Choose column containing gene symbol/microRNA name dialog
Select Homo sapiens and hg19 from the Species and Genome Build drop-down menus
Figure 2. Configure the genomic properties dialog as shown
There is now an * after the spreadsheet name in the spreadsheet tree. This indicates an unsaved change has been made to the spreadsheet.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Select () to save the changes