arrow-left

All pages
gitbookPowered by GitBook
1 of 9

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Creating a gene list using the Venn Diagram

The List Manager can be used to generate lists of genes by applying criteria such as fold change and false discovery rate (FDR) adjusted p-value thresholds.

  • Select the Analysis tab

  • Select ANOVAResults in the spreadsheet tree

  • Select Create Gene List from the Analysis section of the Gene Expression workflow (Figure 1)

Figure 1. Selecting Create Gene List from the Gene Expression workflow

  • Select E2 vs. Control from the Contrast panel of the ANOVA Streamlined tab in the List Manager dialog

  • Deselect the Include size of the change option

  • Set p-value with FDR < to 0.1 (Figure 2)

Figure 2. Configuring the List Manager using the ANOVA Streamlined filtering options

There should be ~545 probe(sets)/genes that meet this threshold.

  • Select Create

A new spreadsheet, E2 vs. Control, will be added as a child spreadsheet of Breast_Cancer.txt.

  • Repeat the steps listed above to create lists for E2+ICI vs. Control (~24 genes), E2+Ral vs. Control (~22 genes), and E2+TOT vs. Control (~177 genes) with the same threashold

Now we can use the Venn Diagram to create a list of genes that are differentially regulated in all treatment groups.

  • Select the Venn Diagram tab in the List Manager dialog

The Venn Diagram shows overlap between selected gene lists.

  • Select the four created lists (E-H) in the spreadsheet list in the List Manager dialog by selecting each while holding the Ctrl key on your keyboard

The Venn Diagram will display the number of overlapping and distinct genes from the four lists (Figure 3).

Figure 3. Viewing the Venn Diagram with intersections of four lists of significant genes

The intersection of the four ellipses shows that 14 differentially regulated genes are in common between the four threatment schemes.

  • Select the region intersecting all four ellipses

  • Right-click the intersected region

  • Select Create List From Highlighted Regions

The new list will appear in the spreadsheet tree with a temporary file name (ptpm).

  • Select the temporary list in the spreadsheet tree

  • Select () from the command bar

  • Save the list as fourtreatments

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Select Close to exit the List Manager dialog
our support pagearrow-up-right

Gene Expression Analysis with Batch Effects

This tutorial will will illustrate:

  • Importing the data set

  • Adding an annotation link

  • Exploring the data set with PCA

Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

hashtag
Description of the Data Set

The data for this tutorial is taken from an experiment that examined the effects of four treatment conditions at two time points on estrogen receptor-positive breast cancer cell lines in vitro. Each treatment/time combination has two replicates and there are two control samples for a total of eighteen samples. Gene expression analysis was performed using the Affymetrix GeneChip_®_ Human U95A array. Values are transformed to log base 2 scale by f(x) = log2(x+1).

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Detect differentially expressed genes with ANOVA
Removing batch effects
Creating a gene list using the Venn Diagram
Hierarchical clustering using a gene list
GO enrichment using a gene list
Our support pagearrow-up-right
our support pagearrow-up-right

Adding an annotation link

While many types of data sets are automatically linked with appropriate annotation files upon import, if this does not occur, a spreadsheet can be manually linked with an annotation file.

  • Right-click Breast_Cancer.txt in the spreadsheet tree

  • Select Properties (Figure 1)

Figure 1. Selecting file properties for a spreadsheet

Configure the Configure Genomic Properties as shown (Figure 2) with the following steps:

  • Select Gene Expression from the Choose the type of genomic data drop-down menu

  • Select Feature in column label

  • Select Browse...

Figure 2. Configure the genomic properties dialog as shown

There is now an * after the spreadsheet name in the spreadsheet tree. This indicates an unsaved change has been made to the spreadsheet.

  • Select () to save the changes

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Select HG_U95Av2.na36.annot.csv from the microarray libraries folder
  • Select Set Column

  • Select Gene Symbol from the Choose column containing gene symbol/microRNA name dialog

  • Select Homo sapiens and hg19 from the Species and Genome Build drop-down menus

  • our support pagearrow-up-right

    Importing the data set

    The original experiment is listed on the Gene Expression Omnibus as GSE848; however, this tutorial only uses a subset of the original experiment and should be downloaded from the Partek website tutorial page, Gene Expression Analysis with Batch Effectsarrow-up-right.

    • Download the zipped project folder, Breast_Cancer-GE.zip

    • Unzip the project folder to C:/Partek Training Data/ or a directory of your choosing

    This location should be easily accessible. The unzipped Breast_Cancer-GE project folder and a zipped annotation file will be added to the selected directory.

    • Unzip the included annotation file, HG_U95Av2.na32.annot.rar

    • Move the annotation file, HG_U95Av2.na32.annot, to the microarray libraries folder

    By default, the microarray libraries folder will be located at C:/Microarray Libraries, but the location may vary depending on your operating system and configuration.

    • Open Partek Genomics Suite

    • Select () from the main command bar

    • Navigate to the tutorial folder, Breast_Cancer-GE

    Figure 1. Opening a data file. The red Partek Genomics Suite icon is shown next to the data file (FMT file format)

    The spreadsheet will open as 1 (Breast_Cancer.txt) (Figure 2).

    Figure 2. Breast_Cancer.txt data file

    The summary at the bottom the spreadsheet shows there are 18 rows and 12,631 columns in the spreadsheet. The first column contains the Filename listing the GEO GSM number. This is also is an identifier for the microarray. Treatment, Time, and Batch are in columns 2, 3, and 4, respectively. Column 6 marks the beginning of the probesets. The data is log2 transformed.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Detect differentially expressed genes with ANOVA

    Analysis of variance (ANOVA) is a very powerful technique for identifying differentially expressed genes in a multi-factor experiment. In this data set, ANOVA will be used to generate a list of genes that are significantly differentially regulated by each treatment.

    hashtag
    Adding factors and interactions

    When setting up the ANOVA, the primary factors of interest, Treatment and Time, should be included. We will also include the interaction between Treatment and Time, Treatment * Time, because we are interested in whether different treatments behave differently over time. From our exploratory analysis using PCA, we also know that Batch

    Select Breast_Cancer.txt
  • Select Open (Figure 1)

  • our support pagearrow-up-right
    is a major source of variation and needs to be included. Including
    Batch
    as a random factor will allow us to account for the batch effect.
    • Select Detect differentially expressed genes from the Analysis section of the Gene Expression workflow

    • Select Treatment, Time, and Batch in the Experimental Factor(s) panel

    • Select Add Factor > to move the selections to the ANOVA Factor(s) panel

    • Select both Treatment and Time in the Experimental Factor(s) panel by holding Ctrl on the keyboard while selecting each

    • Select Add Interaction > to add the Treatment * Time interaction to the ANOVA Factor(s) panel (Figure 1)

    • Do not select OK or Apply. We still need to add linear contrasts to the ANOVA model

    Figure 1. Adding factors and interactions to the ANOVA

    hashtag
    Adding linear contrasts

    ANOVA will output a p-value and F ratio for each factor or interaction; to get the fold-change and ratio between the different levels of a factor or interaction, linear contrasts, or comparisons, must be added.

    • Select Contrasts... in the ANOVA dialog (Figure 1)

    • Select Yes for Data is already log transformed?

    • Select Treatment * Time from the Select Factor/Interaction drop-down menu

    We will add contrasts comparing each of the three treatment groups to the control group.

    • Select E2 * 8 and E2 * 48 from the Candidate Level(s) panel

    • Select Add Contrast Level > to move them to the top panel (Group 1) on the right-hand side

    The Group 1 panel will be renamed after the contents of the panel. We can specify a name for the group.

    • Set Label of the top panel to E2

    • Select Control * 0 from the Candidate Level(s) panel

    • Select Add Contrast Level > to move it to the bottom panel (Group 2) on the right-hand side

    • Set Label of the bottom panel to Control

    The lower panel (Group 2) is considered the reference level. Because the data is log2 transformed, the geometric mean will be used to calculate the fold change and mean ratio to place both on a linear scale instead of a log scale.

    • Select **Add Contrast (**Figure 2)

    Figure 2. Adding a contrast between E2 vs. Control at all time points.

    To examine the time points of each treatment condition separately, we can select Add Combinations instead of Add Contrast. This adds every possible contrast for the levels in the Group 1 and Group 2 panels.

    • Select E2 * 8 and E2 * 48 from the Candidate Level(s) panel

    • Select Add Contrast Level > to move them to the top panel (Group 1) on the right-hand side

    • Select Control * 0 from the Candidate Level(s) panel

    • Select Add Contrast Level > to move it to the bottom panel (Group 2) on the right-hand side

    • Select Add Combinations to add contrasts for E2 * 8 vs. Control * 0 and E2 * 48 vs. Control * 0 (Figure 3)

    Figure 3. Add Combinations creates contrasts for every combination of levels from the two group panels.

    For this tutorial, we will not be considering the time points of each treatment condition individually. We can remove the E2 * 8 vs. Control * 0 and E2 * 48 vs. Control * 0 contrasts.

    • Select E2 * 8 vs. Control * 0 and E2 * 48 vs. Control * 0 from the contrasts list

    • Select Delete

    We will now add contrasts for the other treatment conditions.

    • Add contrasts for E2+ICI vs. Control, E2+Ral vs. Control, and E2+TOT vs. Control following the steps outlined for E2 vs. Control

    There should now be four contrasts added to the contrasts panel (Figure 4).

    Figure 4. Fully configured contrasts for the tutorial

    • Select OK to add the contrasts to the ANVOA model

    The Contrasts... button should now read Contrasts Included in the ANOVA dialog.

    • Select OK to perform the ANOVA

    hashtag
    ANOVA results spreadsheet

    The result of the 3-way mixed model ANOVA is displayed in a new spreadsheet, ANOVA-3way (ANOVAResults) that is a child of the Breast_Cancer.txt spreadsheet. In ANOVAResults, each row represents a probe(set)/gene with the columns containing the results of the ANOVA (Figure 5).

    Figure 5. Viewing the ANOVA Results spreadsheet. Probe(sets)/genes are on rows and the ANOVA results are on columns.

    By default, the rows are sorted in acending order by the p-value of the first factor, which places the most significantly differentially expressed gene between different treatments at the top of the spreadsheet.

    Each factor in the ANOVA adds p-value, F value, and SS value columns. F value is a ratio of signal to noise; high values indicate that the probe(set)/gene explains variation in the data set due to the factor. SS value is the sum of squares.

    Each contrast in the ANOVA adds p-value, ratio, and fold-change columns. The p-value is calculated using log space. The ratio and fold change are calculated using linear space.

    hashtag
    Viewing the sources of variation

    Sources of variation captured in the ANOVA can be viewed for the entire data set or for individual probe(sets)/genes.

    • Select View Sources of Variation from the Analysis section of the Gene Expression workflow

    The Sources of Variation plot will open in a new tab (Figure 6).

    Figure 6. Viewing the sources of variation plot. Non-random factors are included when ANOVA is run using the default REML modle.

    This plot presents the signal to noise ratio accross all probe(sets)/genes for each of the non-random factors and interactions in the ANOVA model. The y-axis represents the average mean square or F ratio, the ANOVA measure of variance, for all the probesets. Each bar is a factor and random error is also included. If the factor has a greater mean F ratio than Error, the factor contrinbuted significant variation to the data set.

    Note that Batch is not included as a factor. This is beacuse Batch is a random factor and accounted for by the ANOVA model.

    The sources of variation for each probe(set)/gene can be viewed individually.

    • Right-click on a row header in the ANOVAResults spreadsheet

    • Select Sources of Variation from the pop-up menu

    The plot will open in a new tab. For additional plots that can be invoked from the ANOVA results spreadsheet, see the Visualizations user guide.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Removing batch effects

    By including Batch in the ANOVA model, the variability due to the batch effect is accounted for when calculating p-values for the non-random factors. In this sense, the batch effect has already been removed. However, visualizing biological effects can be very difficult if batch effects are present in the original intensity data used to generate visualizations. We can modify the original intensity data to remove the batch effect using the Remove Batch Effect tool.

    hashtag
    Using the Remove Batch Effect tool

    The Remove Batch Effect tool functions much like ANOVA in reverse, calculating the variation attributed to the factor being removed then adjusting the original intensity values to remove the variation. Once the variation caused by the batch effect has been removed, tools like PCA or clustering can be used to visualize what the data would look like if the batch effect was not present.

    • Select the1 (Breast_Cancer.txt) spreadsheet

    • Select Stat from the main tool bar

    • Select Remove Batch Effect... (Figure 1)

    Figure 1. Invoking the Remove Batch Effect tool

    The Remove Batch Effects dialog will open. The tool functions by performing an ANOVA then modifying the original intensities values to remove the effects of the specified factor(s).

    • Select Treatment, Time, and Batch

    • Select Add Factor > to add them to the ANOVA Factor(s) panel

    • Select Batch in the ANOVA Factor(s) panel

    By default, the results will be displayed in a new spreadsheet. Options to overwrite the current spreadsheet and specify the output file appear in the bottom of the dialog (Figure 2).

    Figure 2. Configuring the Remove Batch Effects tool to remove Batch and create a new spreadsheet

    • Select OK

    The new spreadsheet, 1-removeresult (batch-remove) will open in the Analysis tab (Figure 3).

    Figure 3. Viewing the new spreadsheet with batch effects removed

    hashtag
    Batch effects in PCA

    We can visualize the effects of removing the batch effects using PCA.

    • Select 1 (Breast_Cancer.txt) from the spreadsheet tree

    • Select () plot the PCA scatter plot

    • Select ()

    Figure 4. Adding a centroid for Batch

    • Select OK to close the Add Centroid...

    • Select OK to close the Configure Plot Properties dialog

    The two centroids are distinct, showing the batch effect (Figure 5).

    Figure 5. Viewing a batch effect using PCA. The batches are shown as the pink (A) and yellow (B) centroids. The clear separation of the centroids indicates a batch effect

    • Repeat the above steps for 1-removeresult (batch-remove)

    For 1-removeresult (batch-remove), the centroids of the two batches overlap, showing that the batch effect has been removed (Figure 6).

    Figure 6. Overlapping centroids for batches A and B show that the batch effect has been removed.

    hashtag
    Batch effects in ANOVA results visualizations

    Visualization of ANOVA results for single probe(sets)/genes also benefits from batch removal. To illustrate this, we first need to repeat our ANOVA using the new batch-remove intesitiy values spreadsheet.

    • Select the Analysis tab

    • Select 1-removeresult (batch-remove) in the spreadsheet tree

    • Select Stat from the main toolbar

    Figure 7. Configuring ANOVA to comparing treatment groups to control

    • Select OK to add contrasts

    • Change output file name to ANOVAResults_batch-remove

    • Select OK to perform the ANOVA

    The ANOVAResults_batch-remove spreadsheet will open in the Analysis tab.

    • Select the ANOVAResults spreadsheet

    • Right-click on the row header for row 2, TFF1

    • Select Dot Plot (Orig. Data) (Figure 8)

    Figure 8. Invoking a dot plot from the ANOVAResults spreadsheet

    A dot plot for trefoil factor 1 (TFF1) will open (Figure 9). The dot plot shows gene intensity values (y-axis) for each sample. Samples are grouped by Treatment.

    Figure 9. Viewing the dot plot for trefoil factor 1 (TIFF1) across different treatment groups

    To visualize the batch effect we will make a few changes to the plot.

    • Select H/V to switch the horizontal and vertical axis

    • Select ()

    • Set Color to Batch

    Figure 10. Configuring the dot plot (part 1 of 2)

    • Select the Labels tab

    • Select Column for In Point Labels

    • Select Time from the Column drop-down list (Figure 11)

    Figure 11. Configuring the dot plot (part 2 of 2)

    • Select OK

    The dot plot now clearly shows the batch effect (Figure 12). Samples within treatment groups are separated clearly between the two batches shown in blue and red.

    Figure 12. Viewing a dot plot showing a batch effect. Each dot is a sample. The y-axis is treatment combinations; the x-axis is the expression value of the TFF1 gene. Dots are colored by batch, sized by time, connected by treatment combination, and labeled by time.

    To view the effects of batch removal, we can view this dot plot for the ANOVAResults_batch-remove spreadsheet.

    • Select the Analysis tab

    • Select ANOVA-3way (ANOVAResults_batch-remove) from the spreadsheet tree

    • Repeat the steps shown above to create the dot plot for trefoil factor 1

    The dot plot invoked from the ANOVAResults_batch-remove) spreadsheet shows that the batch effect has been removed as all the samples no longer clearly separate by color within treatment groups (Figure 13).

    Figure 13. Viewing the dot plot that shows batch effect removal. The plot configuration matches Figure 12.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Select Add Factor > to add Batch to the Remove Effect(s) of These Factor(s) panel

    Set Drawing Mode to Mixed
  • Select the Ellipsoids tab

  • Select Add Centroid

  • Add Batch to the Grouping Variable(s) panel

  • Set the colors of the two centroids as shown (Figure 4) to pink and yellow

  • Select ANOVA...
  • Add Treatment, Time, and Batch factors to the ANOVA Factor(s) panel

  • Add Treatment * Time interaction to the ANOVA Factor(s) panel

  • Select Contrasts...

  • Select Treatment from the Select Factor/Interaction drop-down menu

  • Select Yes for Data is already log transformed?

  • Set up contrasts of treatment vs. control for E2, E2+ICI, E2+Ral, and E2+TOT (Figure 7)

  • Set Size to Time
  • Set Connect to Treatment Combination (Figure 10)

  • our support pagearrow-up-right
    alt text

    GO enrichment using a gene list

    Gene Ontology (GO) enrichment analysis compares a gene list to lists of genes associated with biological processes, cellular compartments, and molecular functions to provide biological insights. Once a list of genes has been created, it is possible to see which GO terms the genes are associated with and whether any GO terms are significantly enriched in the gene list.

    • Select the E2 vs. Control spreadsheet from the spreadsheet tree

    • Select Gene Set Analysis from the Biological Interpretation section of the Gene Expression workflow

    • Select Next > to continue with GO Enrichment

    • Select Next > to continue with 1/E2_vs_Control (E2 vs. Control)

    • Select Next > to continue with default parameter settings

    • Select Next > to continue with the default mapping file

    A new spreadsheet 1 (GO-Enrichment.txt) will open as a child spreadsheet of E2 vs. Control (Figure 1).

    Figure 1. GO Enrichment results spreadsheet

    GO terms are shown in rows and are sorted by ascending enrichment p-value.

    To visualize the results, we can launch the Gene Ontology Browser.

    • Select View from the main tool bar

    • Select Gene Ontology Browser

    The Gene Ontology Browser will open in a new tab (Figure 2).

    Figure 2. Viewing GO enrichment results in the Gene Ontology Browser

    The bar chart shows the GO terms with the highest enrichment scores for the gene list.

    To learn more about GO enrichment and using the Gene Ontology Browser, please consult the tutorial.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Gene Ontology Enrichment
    our support pagearrow-up-right

    Hierarchical clustering using a gene list

    hashtag
    Opening a gene list as a child spreadsheet

    Gene lists can be visualized and their ability to distinguish samples evaluated using a hierarchical clustering heat map. Because of the batch effect in this data set, we will perform hierarchical clustering using batch-corrected intensity values. To do this, we need to open the fourtreatments list of differentially expressed genes as a child spreadsheet of the batch-remove spreadsheet

    • Select fourtreatments from the spreadsheet tree

    • Select () to close the spreadsheet

    • Select 1-removeresult (batch-remove) from the spreadsheet tree

    • Select File from the main tool bar

    • Select Open as child...

    • Select fourtreatments using the file browser

    The fourtreatments spreadsheet will open as a child spreadsheet of batch-remove (Figure 1).

    Figure 1. The fourtreatments spreadsheet is open as a child spreadsheet of bath-remove. Visualizations performed using fourtreatments will pull intensity values from batch-remove.

    Visualizations performed using the fourtreatments spreadsheet will now use intensity values from the batch-remove spreadsheet.

    hashtag
    Hierarchical clustering using a gene list

    To invoke hierarchical clustering, follow the steps below.

    • Select Cluster Based on Significant Genes from the Visualization section of the Gene Expression workflow

    • Select Hierarchical Clustering

    • Select OK

    Figure 2. Configuring the Cluster the significant genes dialog

    • Select OK

    The hiearchical clustering heat map will open in a new tab (Figure 3).

    Figure 3. Hierarchical clustering of genes with significantly different expression across the treatment groups

    Genes without changes in expression are given a value of zero and are colored black. Up-regulated genes have positive values and are displayed in red. Down-regulated genes have negative values and are displayed in green. Each sample is represented in a row while genes are represented as columns. Dendrograms illustrate clustering of samples and genes. To learn more about configuring the hierarchical clustering heat map, see the user guide.

    For detailed information about the methods used for clustering, refer to the Partek Manual Chapter 8: Hierarchical & Partitioning Clustering.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Exploring the data set with PCA

    Principal Components Analysis (PCA) is an excellent method to visualize similarities and differences between the samples in a data set. PCA can be invoked through a workflow, by selecting () from the main command bar, or by selecting Scatter Plot from the View section of the main toolbar. We will use a workflow.

    • Select Gene Expression from the Workflows drop-down menu

    • Select PCA Scatter Plot from the QA/QC section of the Gene Expression

    Select 1-removeresult/1 (fourtreatments) from the drop-down menu

  • Select Standardize for Expression normalization (Figure 2)

  • Hierarchical Clustering Analysis
    our support pagearrow-up-right
    workflow

    The PCA scatter plot will open as a new tab (Figure 1).

    alt text

    Figure 1. Viewing the PCA scatter plot. Each point is a sample. Samples are colored by treatment.

    In this PCA scatter plot, each point represents a sample in the spreadsheet. Points that are close together in the plot are more similar, while points that are far apart in the plot are more dissimilar.

    To better view the data, we can rotate the plot.

    • Select () to activate Rotate Mode

    • Click and drag to rotate the plot

    Rotating the plot allows us to look for outliers in the data on each of the three principal components (PC1-3). The percentage of the total variation explained by each PC is listed by its axis label. The chart label shows the sum percentage of the total variation explained by the displayed PCs.

    We can change the plot properties to better visualize the effects of different variables.

    • Select () to open the Configure__Plot Properties dialog

    • Set Shape to 4. Batch

    • Set Size to 3. Time

    • Set Connect to 5. Treatment Combination

    • Select OK (Figure 2)

    Figure 2. Configuring plot properties to color by treatment, shape by batch, size by time, and connect by treatment combination

    The PCA scatter plot now shows information about treament, batch, and time for each sample (Figure 3).

    Figure 3. PCA scatter plot showing treatment, batch, and time information for each sample. A batch effect is clearly visible.

    PCA is particularly useful for identifying outliers and batch effects in data sets. We can see a batch effect in this dataset as samples separate by batch. To make this more clear, we can add an ellipses by Batch.

    • Select () to open the Configure__Plot Properties dialog

    • Select Ellipsoids from the tab

    • Select Add Ellipse/Ellipsoid

    • Select Ellipse

    • Select Batch from the Categorical Vairable(s) panel and move it to the Group Variable(s) panel

    • Select OK

    • Select OK to close the dialog

    The ellipses help illustrate that the data is spearated by batches (Figure 4).

    Figure 4. Ellipses around batch groups show that samples separate by batch

    Ways to address the batch effect in the data set will be detailed later in this tutorial.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    alt text