Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Scientists often develop lists of genes, probes, transcripts, SNPs, and genomic regions of interest from analysis tools, research papers, and databases. Using Partek Genomics Suite, these lists can be integrated with genomics data sets, analyzed with powerful statistics, and visualized for new insights.
This user guide will illustrate:
This user guide does not discuss every operation that can be performed on an imported list of regions, SNPs, or genes. If there is some other feature in Partek Genomics Suite that you would like to apply to an imported list, please contact the technical support team for additional guidance. If you have found a novel use of a Partek Genomics Suite feature on an imported list that you think should be included in this user guide, please let us know.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Gene Ontology (GO) Enrichment p-value calculation uses either a Chi-Square or Fisher’s Exact test to compare the genes included in the significant gene list to all possible genes present in the experiment or the background genes. For a microarray experiment, background genes consists of all genes on the chip/array; for a next generation sequencing experiment, all genes in the species transcriptome are considered background genes.
Because the calculation is essentially comparing overlapping sets of genes and does not use intensity values, GO Enrichment can be performed on an imported gene list even without any numerical values. GO Enrichment is available through the Gene Expression workflow.
If no annotation file has been specified for the gene list, GO Enrichment will use the full species transcriptome as the background genes. While suitable for next generation sequencing experiments, for microarray experiments, only the genes on the chip/array are appropriate. Please contact our technical support department for assistance with this step if needed.
Like GO Enrichment, Pathway Enrichment does not require numerical values, but instead operates on lists of genes - a list of significant genes vs. background genes. Consequently, Pathway Enrichment may be used with an imported list of genes even without any numerical values. The list of background genes is set to the species transcriptome by default, but can be set to a specific set of genes if the gene list has been associated with an annotation file.
A gene list can be used to filter another spreadsheet. As an example, we will filter the results of an ANOVA on microarray data using a gene list. This will create a spreadsheet with ANOVA results for only the genes included in our gene list.
Open the filtering gene list and target spreadsheets
Select the target spreadsheet in the spreadsheet tree, in this example, genes are on rows in ANOVA result spreadsheet
Select Filter from the main toolbar
Select Filter Rows Based on a List... from Filter Rows (Figure 1)
Select the matching column of your target spreadsheet from the Key column drop-down menu; here we have selected 4. Gene Symbol (Figure 2)
Select the filtering gene list from the Filter based on spreadsheet drop-down menu; here we have selected 1 (Gene List.txt)
Select the matching column of your filtering gene list from the Key column drop-down menu; here we have selected 1. Symbol
Select OK to apply the filter
The target spreadsheet will display the filtered rows (Figure 3). Note that the number of rows has gone from 22,283 prior to filtering (Figure 1) to 153 after filtering (Figure 3).
To use this filtered list for downstream analysis, we can save it.
Right-click the open spreadsheet in the spreadsheet tree
Select Clone...
Use the Clone Spreadsheet dialog to name the new spreadsheet and choose its place in the spreadsheet hierarchy
Select OK
The new spreadsheet will open. If you want to use the new spreadsheet again in the future, be sure to save it.
If your imported data contains a list of p-values, you can use any of the available multiple test corrections.
Select Stat from the main toolbar
Select Multiple Test
Select Multiple Test Corrections to launch a dialog with available options (Figure 4), it will add corrected p-value column(s) to the right of the selected p-value column(s)
A variety of profile plots can be used to visualize the numerical data associated with your imported gene list.
Select View from the main toolbar
Select any applicable option
If you have imported numerical data associated with genes (like p-values or fold-changes), you can visualize these values in the Genome Browser once an annotation file is associated to the spreadsheet, and there is genomic location information in the annotation file.
Right-click on a row header in the imported gene list spreadsheet
Select Browse to location
If the annotations have been configured properly, you should see a Regions track for the first column of numerical data, a cytoband track, and an annotation track. You can also add another track to display a second column of numerical data.
Select New Track
Select Add a track from spreadsheet
Select Next >
A new track titled Regions will be added.
Select Regions in the track preferences panel to edit it
Select the other numerical column in the Bar height by drop-down menu
For a gene list with expression values on each sample, clustering can be performed. Access the clustering function through the toolbar, not from a workflow. The workflow implementations assume that the data to be clustered are found on a parent spreadsheet and the list of genes is in a child spreadsheet.
Select Tools form the main toolbar
Select Discover then Hierarchical Clustering
Hierarchical Clustering assumes that samples are rows and genes are columns so consider transposing your data if this is not the case. If you have only one column or row of data, cluster only on the dimension with multiple categories by deselecting either Rows or Columns from What to Cluster in the Hierarchical Clustering dialog.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The preferred method for importing a generic data spreadsheet into Partek Genomics Suite is as a text file. Here, we illustrate importing a list of genes with p-value and fold-change from an experiment comparing two conditions.
Select File from the main toolbar
Select Import
Select Text (.csv .txt)...
Select the text file using the file browser to launch the Import .txt, .tsv, or .csv File dialog
The File Type section of the Import dialog includes a preview of the text file and import options (Figure 1).
The columns in the import file can be separate by a tab, comma, or any other character.
For most applications, the items on the list should be in rows while attributes or values should be in columns. If a list is oriented with items on columns, select Transpose the file to to import a transposed spreadsheet.
Select Next > to move to the Data Type section
Select your data type; here we have chosen Genomic Data because it is a gene list (Figure 2)
We have also deselected Is the data log transformed (LOG_base (x+offset) ) ?
Selecting Genomic Data will open a dialog after import to configure properties for the imported list including selecting the type of genomic data, the location of genomic features in the spreadsheet, the annotation column with gene symbols, the chip or reference source and annotation file, the species, and reference genome build.
Select Next >
The next step is to identify where the data starts and where the optional header is found using Identify Column Labels, Start of Data (Figure 3). The line that contains the header (if present) must precede the data. If there are lines to be skipped in the file (like comments), they may only appear at the top of the file, before the header line or data begin.
If there are many comment lines at the start of the file, you may need to select View Next 5 Records to get to the row that contains the column header. If you accidentally move past the screen that contains the header or data rows, select View Previous 5 Records.
If there are missing numerical values or empty cells in your input list, insert a special character or symbol (?, N/A, NA, etc.) in the missing cells; you will specify the character in the Missing Data Representation section of the dialog, only one symbol can be used to represent missing values, the default missing value indicator is ?.
If a header row is present, select Col Lbls to allow you to select a column header row
Select the row where the data beings using the Begin Data selector
If any cells have a missing value, you can signify this with a special symbol selected using the Missing Data Representation panel
Select Next >
The Preview text encoding section (Figure 4) previews the first five lines of the file, allowing you to check if the text encoding is correct.
If the text does not appear properly, use the Specify the text encoding: drop-down menu to choose the correct encoding
Select Next >
The final section of the Import .txt, .tsv, or .csv File dialog is Verify Type & Attribute of Data Columns (Figure 5). While data column type and attribute can be modified after import, it is easier and faster to select the proper options during import as multiple columns may be selected during this dialog.
Check and modify column types and attributes
If there is an identifier like gene symbol or SNP, the Type field for that column should be set to text and Attribute should be set to label. Numeric values (intensities, p-values, fold-changes, etc.) should have Type set to double and Attribute set to response. The other possible value for Attribute is factor and describes sample data. The user interface is this dialog allows you to select multiple columns at once (Ctrl+left click and Shift+left click). The interface controls are detailed in the dialog (Figure 5).
Select Finish to import the text file and open it as a spreadsheet
If Genomic data was selected in the Data Type section, the Configure Genomic Properties dialog will open (Figure 6). These options will be discussed in the next section when we add an annotation file.
Select OK
The imported spreadsheet will open (Figure 7).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
A region list must contain the chromosome, start location, and stop locations as the first three columns. The chromosome number in the region list must be compatible with the genomic annotation for the species if you plan to use any feature (like motif detection) that requires reference sequence information.
Import the region list as described above for text files with the following options
Select Other for data type
Set chromosome as a text field
Set location start and stop as either integer or text fields
Right-click on the imported spreadsheet in the spreadsheet tree
Select Properties
Select List of genomic regions from the Configure Spreadsheet dialog to add region to the properties (Figure 1)
Figure 1. Adding region to the properties of a spreadsheet
The spreadsheet properties will now include region. Alternatively, region can be added as a spreadsheet property from the Configure Genomic Properties dialog by selecting Advanced.. , choosing region from the drop-down menu, selecting Add, and selecting OK.
If you would like to do any operation that requires looking up the reference genomic sequence information for the regions based on genomic location, you will need to specify the species for this region list.
Right-click on the imported spreadsheet in the spreadsheet tree
Select Properties
Select species from the Add Property drop-down menu and click Add
Specify the Species Name and Genome Build from the drop-down menus
Select OK
Starting with a region list, you may detect either known or de novo motifs using the ChIP-Seq workflow if your spreadsheet has been associated with a species and a reference genome.
Select ChIP-Seq from the Workflows drop-down menu
Select Motif detection from the Peak Analysis section of the workflow
Both Discover de novo motifs and Search for known motifs can be performed. Motif detection sequence information of the genome, you can specify either .2bit file or .fa file which can be used to create .2bit file
If you have a region list or a .BED file and you have a microarray experiment with data, you can summarize the microarray data by the genomic coordinates contained in the region list. For example, the region list contains a list of CpG islands, the experiment contains methylation percentage values for probes (β values), and you would like to summarize the methylation values of all probes in each CpG island.
Import the region list (or .BED file)
Be sure that you have added the region property. The list of region coordinates (chromosome, start, stop) from the region list will be mapped against the reference genome specified for the microarray data so specifying Species and Genome Build for your region list is unnecessary.
Open the microarray data spreadsheet, this spreadsheet should have annotation file associated to, and there are genomic location information in the annotation file.
Samples should be on rows and data on columns in the microarray data spreadsheet.
Select the region list spreadsheet
Right-click any column header in the region list spreadsheet
Select Insert Average from the pop-up menu (Figure 2)
Figure 2. Adding the average values for a region list
Select the microarray data spreadsheet containing the values you want to average for each region from the Get average from spreadsheet drop-down menu
There are three options for averaging the data (Figure 3). Mean of samples significant in region is used when the region list has SampleIDs from the microarray data set associated with each region. In this case, only the microarray data set samples specified for each region would be included in the mean calculation. Mean of all samples will add columns for the mean value of all probes for all samples and the number of probes for all samples in each region. Mean value for all samples separately will add two columns for each sample with the mean value of all probes for that sample and the number of probes for that sample in each region.
We have selected Mean value for all samples
Select OK (Figure 3)
Figure 3. Selecting options for adding average values for regions
Columns will be added to the regions list spreadsheet. Here, we have added two columns with the average β-value for all samples in each CpG island and the number of probes in each CpG island (Figure 4).
Figure 4. Added average beta values and number of probes per CpG island
If you have two or more region lists with coordinates on the same reference genome, you can compare them to identify overlapping regions.
Open all region list spreadsheets that you want to compare
Select Tools from the main toolbar
Select Find Region Overlaps (Figure 5)
Figure 5. Selecting Find Region Overlaps
The Find Region Overlaps tool has two modes of operation. The first, Report all regions, creates a new spreadsheet with any regions that did not intersect and all regions of intersection between any of the input lists. For each intersection, the start and stop coordinates of the intersection and the percent overlap between the intersected region with each of the regions in the input lists are reported. The second, Only report regions present in all lists creates a new spreadsheet with the intersected regions found in all the lists.
Select your preferred mode; we have selected Only report regions present in all lists
Select Add New Spreadsheet to add any spreadsheets you want to compare; we are comparing two region list spreadsheets (Figure 6)
Select OK
Figure 6. Configuring Find Overlapping Regions
A new region list spreadsheet will be created (Figure 7). The new region list is a temporary spreadsheet so be sure to save it if you want to keep it.
Figure 7. Spreadsheet with regions present in all lists
To be annotated using the Annotate SNVs tool, an imported SNV position list must have four columns per locus:
Position of the SNP listed as chr.basePosition
Sample ID or name
The reference base
The SNP call (sample genotype base)
Prepare input list as shown (Figure 8) with four columns describing the position, sample, reference base, and sample genotype base for each SNV
Figure 8. An imported SNV list must follow this format to be annotated by the Annotate SNV tool. The first column must be the position and the position must follow the format shown, chr.basePosition
Save as either a tab-separated or comma separated file
Import the table as a text file
Select Genomic data for What type of data is this file?
Set the position column Type to text
Set the other columns Type to categorical
Select Genomic location instead of marker IDs from the Choose the type of genomic data drop-down menu of the Configure Genomic Properties dialog
Specify the Species and Genome Build
Select OK
The Annotate SNVs tool can now be invoked on this spreadsheet to generate an annotation spreadsheet (Figure 9).
Figure 9. Annotate SNVs creates a new spreadsheet annotating each SNV from the source list
There are many useful visualizations, annotations, and biological interpretation tools that can operate on a gene list. In order for these features work with an imported list, an annotation file must be associated with the gene list. Additionally, many operations that work with a list of significant genes (like GO- or Pathway-Enrichment) require comparison against a background of “non-significant” genes. The quickest way to accomplish both is to use the background of “all genes” for that organism provided by an annotation source like RefSeq, Ensembl, etc. in .pannot (Partek annotation), .gff, .gtf, .bed, tab- or comma-delimited format. If the file is not already in a tab-separated or comma delimited format, you may import, modify, and save the file in the proper file format.
Select File from the main toolbar
Select Genomic Database under Import (Figure 1)
Select the annotation file; in this example, we select a .pannot file downloaded from Partek distributed library file repository – hg19_refseq_14_01_03_v2.pannot
Delete or rearrange the columns as necessary; we have placed the column with identifiers (should be unique ID) that correspond to our gene list first
Select File then Save As Text File... to save the annotation file; we have named it Annotation File (Figure 2)
Now we can add the annotation file to our imported gene list.
Right click 1 (gene_list.txt) in the spreadsheet tree
Select Properties from the pop-up menu
This brings up the Configure Genomic Properties dialog (Figure 3).
Select Browse under Annotation File
Choose the annotation file; we have chosen Annotation File.txt
If this is the first time you have used an annotation, the Configure Annotation dialog will launch. This is used to choose the columns with the chromosome number and position information for each feature. Our example annotation file has chromosome, start, and stop in separate columns.
Select the proper column configuration options (Figure 4)
Select Close to return to the Configure Genomic Properties dialog
Select Set Column: to open the Choose column with gene symbols or microRNA names dialog (Figure 5)
Select the appropriate column; here the default choice of 1. Symbol is appropriate
Select OK to return to the Configure Genomic Properties dialog
Select the appropriate species and genome build options; we have selected Homo sapiens and hg19 (Figure 6)
Select OK
The annotation file has been associated with the spreadsheet and additional tasks can now be performed on the data, e.g. since the annotation has genomic location, you can draw chromosome view on this data.
If an annotation file has been associated with a spreadsheet, annotations from the file can be added as columns in the spreadsheet when each identifier is on a row.
Right click on a column header
Select Insert Annotation
Select columns to add from Column Configuration; we have selected Chromosome, Start, and Stop (Figure 7)
Select OK
Partek Genomics Suite is a comprehensive suite of advanced statistics and interactive data visualization specifically designed to reliably extract biological signals from noise. Designed for high-dimensional genomic studies containing thousands of samples, Partek Genomics Suite is fast, memory efficient and will analyze large data sets on a personal computer. It supports a complete workflow including convenient data access tools, identification and annotation of important biomarkers, and construction and validation of predictive diagnostic classification systems.
Additional information can be found in the manual for Partek Genomics Suite version 6.6.
A list of SNPs using dbSNP IDs can be imported as a text file and associated with an annotation file as described for a list of genes. The annotation file you use to annotate the SNPs should minimally contain the chromosome number and physical position of each locus.
Novel SNPs or SNPs that are not found in your annotation source must be imported as a region list. For this, follow the procedure outlined in , but use the SNP name in place of a region name.
Starting with a list of SNPs that have been associated with genomic loci using an annotation file and assigned a species with genome build, you can use Find Overlapping Genes to annotate these SNPs with the closest genes.
Select Tools from the main toolbar
Select Find Overlapping Genes (Figure 1)
Figure 1. Adding overlapping genes to a SNP list
Select Add a New Column with the Gene Nearest to the Region from the method dialog
The Report Regions from the specified database dialog will open.
Select your preferred database. Be sure to match the species and genome build of your SNP list
Select OK
This will add 3 columns to the list of SNPs spreadsheet including Nearest Feature, which will indicate the nearest gene and strand (Figure 2).
Figure 2. Find Overlapping Genes adds three columns to a SNP list: overlapping features, nearest feature, and distance to nearest feature (bps)
To allow gene list operations such as GO Enrichment or Pathway Enrichment to be performed on the SNP list, we can set the Nearest Feature column as the gene symbol column for the spreadsheet.
Right click the spreadsheet in the spreadsheet tree
Select Properties from the pop-up menu
Select Gene symbol instead of Marker ID
Select Feature in column and select Nearest Feature (Figure 3)
Select OK
Figure 3. Setting Nearest Feature as the gene symbol allows gene list functions to be performed on a SNP list
If you have a SNP spreadsheet that was generated using Partek Genomics Suite (not imported as a .txt file), you can annotate the SNP list with gene, transcript, exon, and information about the predicted effect of the SNPs.
Select Tools from the main command toolbar
Select Annotate SNVs
A BED (Browser Extensible Data) file is a special case of a region list: it is a tab-delimited text file and the first three columns of BED files contain the chromosome, start, and stop locations. To import a bed file to be used as a data region list, follow the import instructions for region lists. A BED File might also be visualized as an annotation file containing regions in the Genome Browser.
BED files do not contain individual sequences nor do the regions have names. For example, the UCSC Genome Browser has an annotation BED file for CpG islands. You might like to view this information in the context of a methylation microarray data set. Before you can visualize a BED file in the chromosome viewer, you must create a Partek annotation file from the BED file.
Select Tools from the main toolbar
Select Annotation Manager... (Figure 1)
Figure 1. Selecting Annotation Manager
Select Create Annotation from the My Annotations tab of the Annotation Manager dialog (Figure 2)
Figure 2. Creating a new annotation file
Select BED file (.bed) for Annotation Type (Figure 3)
Figure 3. Selecting annotation file type
Select Browse... under Source to specify the BED file; a default new file name and destination will populate Result, but this can be changed
You can specify the name and save location of the new annotation file under Result; we typically choose the Microarray Libraries folder
Specify the Name of the annotation database file
Select the correct Species and Genome Build for the annotation file from the drop-down menus (Figure 4)
Figure 4. Configuring annotation file creation
Preview Chromosome Names would be used if the original file had chromosome names that did not match the genome build that had required modification. For our example, this is unnecessary.
Select OK to create the annotation
The Annotation Manager will display the new annotation in the My Annotations tab (Figure 5)
Figure 5. Viewing created annotation in My Annotations
In order to use a BED file as an Annotation track in the Genome Browser, first create the annotation file as described above, being careful to specify the correct species and genome build.
Right-click a row on any spreadsheet that has genomic features on rows (gene lists, ANOVA results, SNP detection)
Select either Browse to Row or Browse to Location to invoke the Genome Browser tab
Select New Track from the Tracks panel of the Genome Browser (Figure 6)
Figure 6. Adding a new track to the Genome Viewer
Select Add an annotation track with genomic features from a selected annotation source from the Track Wizard dialog (Figure 7)
Figure 7. Track Wizard dialog
Select Next >
Choose the annotation file you created; here we have selected UCSC CpG Islands (Figure 8)
If your annotation file does not contain strand information for each region, deselect Separate Strands; here we have deselected it
Figure 8. Choosing the annotation file
Select Create
A new track will be created from the annotation file (Figure 9). If Separate Strands had been selected, there would be two tracks, one for each strand, like we see for the RefSeq Transcripts - 2014-01-03 (+) and (-) tracks (Figure 8).
Figure 9. Viewing the added annotation file as a track in the genome viewer
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () to close the annotation file
Select () to save the spreadsheet
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The method used to detect changes in functional groups is ANOVA. For detailed information about ANOVA, see Chapter 11 of the Partek User Manual. There is one result per functional group based on the expression of all the genes contained in the group. Besides all the factors specified in the ANOVA model, the following extra terms will be added to the model by Partek Genomics Suite automatically:
Gene ID - Since not all genes in a functional group express at the same level, gene ID is added to the model to account for gene-to-gene differences
Factor * Gene ID (optional) - Interaction of gene ID with the factor can be added to detect changes within the expression of a GO category with respect to different levels of the factor, referred to in this document as the disruption of the categories expression pattern or simply disruption
Suppose there is an experiment to find genes differentially expressed in two tissues: Two different tissues are taken from each patient and a paired sample t-test, or 2-way ANOVA can be used to analyze the data. The GO ANOVA dialog allows you to specify the ANOVA model, which includes the two factors: tissue and participant ID. The analysis is performed at the gene level, but the result is displayed at the level of the functional group by averaging of the member genes’ results. The equation of the model that can be specified is:
y = µ + T + P + ε
y: expression of a functional group
µ: average expression of the functional group
T: tissue-to-tissue effect
P: participant-to-participant effect (a random effect)
ε: error term
When the tissue is interacted with the gene ID then the ANOVA model becomes more complicated as demonstrated in the model below. The functional group result is not explicitly derived by averaging the member genes as the new model includes terms for both gene and group level results:
y = µ + T + P + G + T *G + ε
y: expression of a functional group
µ: average expression of the functional group
T: tissue-to-tissue effect
P: patient-to-patient effect (this can be specified as a random effect)
G: gene-to-gene effect (differential expression of genes within the function group independent of tissue type)
T*G: Tissue-Gene interaction (differential patterning of gene expression in different tissue types)
ε: error term
In the case that there is more than one data column mapping to the same gene symbol, Partek Genomics Suite will assume that the markers target different isoforms and will not treat the two markers as replicated of the same gene. Instead, each column is treated as a gene unto itself.
If there are only two samples in the spreadsheet then, Partek Genomics Suite cannot calculate a type by gene ID interaction. In this case, the result spreadsheet will contain a column labeled Disruption score. First, for each gene in the functional group Partek Genomics Suite will calculate the difference between the two samples. A z-test is used to compare the difference between each gene and the rest of the genes in the functional group. The disruption score is the minimum p-value from the z-tests comparing each gene to the rest in the functional group. A low disruption score therefore indicates that at least one gene behaves differently from the rest. This implies a change in the pattern of gene expression within the functional group and potential disruption of the normal operation of the group. The category as a whole may or may not exhibit differential expression in addition to the disruption.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
With gene ontology (GO) ANOVA, Partek Genomics Suite includes the ability to use rigorous statistical analysis to find differentially expressed functional groupings of genes. Leveraging the Gene Ontology database, Partek Genomics Suite can organize genes into functional groups. Not only can GO ANOVA detect up and down regulated functional groups, but also functional groups, which are disrupted in a few genes as a result of treatment. Moreover, the common diction of the GO effort enables this analysis to be compared across all types of gene expression data, including those from other species. Traditional tests, such as GO enrichment, require defining filtered lists of differentially expressed genes followed by an analysis of functional groups related to those genes. On the other hand, GO ANOVA is performed directly after data import and normalization. This minimizes the risk that a highly stringent filter will cause important functional groups to be overlooked.
Other tests, such as gene set enrichment analysis (GSEA), tolerate minimal or no pre-filtering. However, these tests are very limited in their ability to integrate complicated experimental designs. GSEA, for example, can only handle two groups at a time. GO ANOVA, on the other hand, can leverage the wealth of sample information collected and use powerful multi-factor ANOVA statistics to analyze very complex interactions and regulatory events. The analysis output includes detailed statistical results specifying the effect and importance of phenotypic information on differential expression and subsequent disruption of Gene Ontology functional categories. Furthermore, GSEA calculates enrichment scores using a running-sum statistic on a ranked gene list. GO ANOVA takes into account more information by utilizing each sample’s expression values to calculate the enrichment score.
Note that the same principles apply to Pathway ANOVA, the only difference being the mapping file; GO ANOVA organizes genes into GO categories, while Pathway ANOVA looks at pathways.
This user guide deals with the following topics:
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
As these features require intensity (or count) data as well as experimental groups, these features cannot be performed on an imported lists.
If the data from imported spreadsheets has been associated with annotations, several integration approaches may be used to integrate multiple kinds of imported data.
The Genome Browser may be used to display data from multiple spreadsheets/experiments regardless of the type of spreadsheets (imported data or microarray or NGS experiments).
The Venn Diagram tool may be used to find overlaps based on a feature name.
The Find Overlapping Regions tool can use an imported gene list and a list of regions from a copy number or ChIP-Seq experiment to identify genomic regions in common.
This User Guide did not discuss every operation that can be performed on an imported list of regions, SNPs, or genes. If there is some other feature that you would like to apply to an imported list, please contact the technical support team for additional guidance. If you have found a novel use of a feature on an imported list that you think should be included in this User Guide, please let us know.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The setup dialog for GO ANOVA can be found in the Biological Interpretation section of the expression workflows (Gene Expression, MicroRNA Expression, Exon, RNA-Seq, miRNA-Seq). It is recommended that GO ANOVA is run on the sheet with expression levels, after import and normalization, though GO ANOVA can be run on any spreadsheet with samples on rows and genes on columns. If a child spreadsheet is selected, such as the result of a prior ANOVA analysis, then the test will be automatically run on the parent spreadsheet.
Upon selecting GO ANOVA (Biological Interpretation > Gene Set Analysis), Partek Genomics Suite will first offer the opportunity to configure the parameters of the test and exclude functional groups with too few or too many genes (Figure 1). To save time when running GO ANOVA, the size of GO categories analyzed can be limited using the Restrict analysis to function groups with fewer than __ genes. Large GO categories may be less interesting and also take the most time to analyze. We recommend to restrict the analysis to the groups with fewer than 150 genes, as it can make the analysis much quicker (and the results easier to interpret). In the current example, the maximum category was set to only 20 genes, for demonstration purposes only.
Figure 1. Configure the parameters of the test: gene ontology categories with too few or too many genes can be excluded
The next dialog (Figure 2) specifies the method of mapping genes to gene sets. Default mapping file is built from annotation files from geneontology.org. Custom mapping file points to the mapping files available on the local computer and present in the Microarray libraries directory. Create a new mapping file from the chip's annotation file option will try to build the annotation file from the annotation file created by the microarray vendor. Create a new mapping file from a spreadsheet enables you to create a custom mapping file from an open spreadsheet, which has gene symbols on one column, and gene groups on the other column. Finally, files in gene matrix transposed (GMT) or gene annotation (GA) formats can also be used.
Figure 2. Setting the method of mapping genes to gene sets
To setup the GO ANOVA dialogue you must consider all factors that would normally be included in an ANOVA model analyzing gene expression among the samples (Figure 3). Briefly this should include:
Experimental factors
Factors explaining sample dependence
Factors explaining noise
For more details on ANOVA, see Chapter 11 of the User’s Manual.
Figure 3. GO ANOVA setup dialog. Including a factor in the ANOVA model (ANOVA Factors) will identify gene ontology (GO) categories whose expression is different across the genes within the category, by the factor of interest. Including a factor as a Disruption Factor will identify GO categories where the expression of the genes within the category are affected but not uniformly across the genes withing the category. Genes (probesets) can be excluded based on expression levels, to reduce the noise.
Factors inherent to the experiment include variables that would be considered as the experimental variables during experiment design. Generally this will include all variables necessary to answer the questions of the researcher. Examples may include factors such as tissue type, disease state, treatment, or dosage.
Sometimes factors do not act independently of each other. For example, different dosages of a drug may affect patients differently over time, or a drug may not affect tissues equally as in many toxicity studies. If the effect of one variable on the other is either suspected of occurring, or of particular interest, an interaction between the two factors should be included. To do this, select the two factors simultaneously by CTRL-clicking the factors and then select Add Interaction.
Factors to control for sample dependence include variables that account for relation between samples. If tissues are collected in pairs from the same patient, patient ID would be included. Similarly if tissues are collected from two distinct populations, this variable should probably be included as well.
Noise variables may be caused by technical processes used during sample collection and processing. Scan data and dye color are often among these variables.
Factors included in the GO ANOVA fall into two separate categories: the normal ANOVA factors (middle box) and those interacting with the gene (right-side box).
Fundamentally, you can run the GO ANOVA with the same parameters used to run a standard ANOVA analysis on gene expression data. (In other words, the middle box of the GO ANOVA is populated exactly as the normal ANOVA and the Interact with Gene box is left empty.) If such an analysis is run, the results would be similar to a standard statistical analysis, except resulting data will report on differential expression of functional categories instead of individual genes. Expression of a functional group is derived from the mean of all genes included within the group. Running GO ANOVA with the same parameters as the differential expression analysis is the most common method of running GO ANOVA. This keeps the analysis much more accessible and the results are easier to interpret.
There is no need to interact a factor with the gene if such an interaction is not of interest. The right most box in the GO ANOVA setup is optional and may be left empty if this is the case.
More advanced analysis can include factors, which are interacted with the genes in the GO ANOVA model. After factors are added to the ANOVA factor(s) box, some can be added additionally to the Disruption Factor(s) box. At the mathematical level, this will include the Factor*Gene term in the model, called a Factor-Gene interaction. At the biological level, this will test whether patterns of gene expression within the functional group are being modified as a result of the factor. This altering of gene expression patterns is referred to in this document as the disruption of the functional group.
For example, if comparing different tissue types, adding tissue to the middle ANOVA factor(s) box, will identify entire GO functional groups that are up or down regulated between tissue types. If comparing nerves and muscles, this might include such categories as myosin binding or actin production, which will be wholly up regulated in muscles as the function is much less important to nerve function.
By interacting tissue with the gene in the model (adding tissue to the right most box), the interaction p-value may provide a method of discovering categories where total expression might not changed significantly but the pattern of gene expression with the category is altered or disrupted. Within a functional group, the interaction p-value represents how similar the patterns of gene expression are between the different tissues. One example of a functional group identified by a tissue*gene interaction might include a category such as ion transfer. Ion transfer is equally important to both nerve and muscle function, but the distribution of ion channels and many of the responsible genes may be quite different between the two.
Sometimes factors may be included in the Interact with Gene box even if they are not of specific interest in a similar way that factors to control for noise are added to the ANOVA factors middle box. If any factors are included in Disruption Factor(s) box, to get the most accurate p-values, the more advanced model must fit the data as well as possible. All factors that may alter gene expression patterns should be included. It is important to keep in mind that the GO ANOVA is not only looking for significance in the factors included, but is attempting to generally fit the data. As appropriate factors are added to the model, not only are more aspects of the data analyzed; the model becomes a better fit to the true data and the results will become more accurate.
To understand how including a Gene*Factor interaction may improve the fit of the model, consider the complex GO ANOVA design in the case of a dose-time analysis of a drug. While it may seem clear that the ANOVA factors in the middle box - dose, time, and the dose*time interaction should be specified (to consider the effect of dose, time, and the change in the effect of dose over time) what to put in the rightmost Gene*Factor box is not as clear. Adding dose alone (which is actually Dose*Gene) will check if different drug doses affect the pattern of gene expression. Similarly adding time into the right box (which is actually Time*Gene) will identify gene ontology categories that are affected in different times but differentially across the genes. While this may be the true limit of questions of interest, including the interactions of the gene and both dose and time may be prudent. In general, if it is likely, or expected, that a factor will affect gene distribution within functional categories, then the factor should be included in the Disruption Factor(s) box if the gene distribution is being analyzed at all.
To review, including a factor in the middle box will identify GO categories whose expression is consistently affected across the genes within the category by the factor of interest. Including a factor in the right box (factor*gene) will identify gene ontology categories where the expression of the genes within the category are affected but not uniformly across the genes within the category.
GO ANOVA is not restricted to analysis of factors with only two levels. The ANOVA p-values are measures of likelihood that all groups are equivalent. While this is useful in general, sometimes tests comparing only two sets of data are more desirable. Using contrasts to define pair wise comparisons in an ANOVA model is superior to using a test that is limited to a two group comparison.
To specify individual pair wise comparisons, press the Contrast button. Contrasts are performed on groups already defined in the ANOVA model. If two tissue types should be compared to each other, select the tissue term from the Select Factor/Interaction dropdown in the upper left. Select either one or a set of categories and add them to group 1 and group 2. All samples falling into group 1 will be compared to all samples falling into group 2. Output will include not only a p-value, but also a fold change. This fold change will represent the average fold change of the GO category between the two groups. Fold change is calculated as Group 1 divided by Group 2. For data in log space, the data is antilogged as well; fold change output is always for data on a linear scale.
Check Exclude probe sets and differential expression p-value(s) > to filter out probe sets (=genes) which are not express in any of the genes. The Exclude probe sets option will remove any gene that meets the specified limit. Using the default options, this will remove low expression genes. Note that the default value of 3 is a suggestion for Affymetrix expression arrays and may not be applicable for other data sets. We suggest to perform exploratory analysis and inspect the distribution of the expression values first (e.g. View > Histogram > Row or View > Box and Whiskers > Row). The sub-checkbox, differential expression p-values, provides an override to the low expression limit. Here, a gene will be included in the analysis despite a low expression value if the gene displays a p-value below the specified limit, suggesting that the gene is differentially expressed
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Hierarchical clustering groups similar objects into clusters. To start, each row and/or column is considered a cluster. The two most similar clusters are then combined and this process is iterated until all objects are in the same cluster. Hierarchical clustering displays the resulting hierarchy of the clusters in a tree called a dendrogram. Hierarchical clustering is useful for exploratory analysis because it shows how samples group together based on similarity of features.
Hierarchical clustering is an unsupervised clustering method. Unsupervised clustering methods do not take the identity or attributes of samples into account when clustering. This means that experimental variables such as treatment, phenotype, tissue, number of expected groups, etc. do not guide or bias cluster building. Supervised clustering methods do consider experimental variables when building clusters.
To illustrate the capabilities and customization options of hierarchical clustering in Partek Genomics Suite, we will explore an example of hierarchical clustering drawn from the tutorial Gene Expression Analysis. The data set in this tutorial includes gene expression data from patients with or without Down syndrome. Using this data set, 23 highly differentially expressed genes between Down syndrome and normal patient tissues were identified. These 23 differentially regulated genes were then used to perform hierarchical clustering of the samples. Follow the steps outlined in Performing hierarchical clustering to perform hierarchical clustering and launch the Hierarchical Clustering tab (Figure 1).
Figure 1. Heatmap showing results of hierarchical clustering
The right-hand section of the Hierarchical Clustering tab is a heat map showing relative expression of the genes in the list used to perform clustering. The heat map can be configured using the properties panel on the left-hand side of the tab. In this example, the low expression value is colored in green, the high expression value is in red, and the mid-point value between min and max is colored in black.The dendrograms on the left-hand side and top of the heat map show clustering of samples as rows and features (probes/genes in this example) as columns. Columns are labeled with the gene symbol if there is enough space for every gene to be annotated. Rows are colored based on the groups of the first sample categorical attribute in the source spreadsheet. The sample legend below the heat map indicates which colors correspond to which attribute group. In this example, Down syndrome patient samples are red and normal patient samples are orange.
The heat map can be configured using the properties panel on the left-hand side of the Hierarchical clustering tab.
Select the Rows tab
Verify that Type appears in the annotation box
Set Width (in pixels) to 25
This will increase the width of the color box indicating sample Type.
Select Show Label
Set Text size to 12
Set Text angle to 90
This angle is relative to the x-axis. When set to 90, the text will run along the y-axis.
Select Apply
The sample attributes are now labeled with group titles (Figure 2).
Figure 2. Labeling heat map with sample attribute groups
Select the Rows tab
Select Tissue from the New Annotation drop-down menu
Select Apply
Color blocks indicating the tissue of each sample have been added to the row labels and sample legend (Figure 3).
Figure 3. Sample attributes can be added to the heat map as sample labels
By default, Partek Genomics Suite displays samples on rows and features on columns. We can transpose the heat map using the Heat Map tab in the plot properties panel.
Select the Heat Map tab
Select Transpose rows and columns in the Orientation section
Select Apply
The plot has been transposed with samples on columns and features on rows. The label for the sample groups is now in the vertical orientation because the settings we applied to Rows has been applied to Columns.
Select the Columns tab
Select the Type track
Set Text angle to 0
Select Apply
The sample group label for Type is now visible (Figure 4).
Figure 4. Heat map columns and rows can be transposed
Each cluster node has two sub-cluster branches (legs) except for the bottom level in the dendrogram, the order of the two branches (or legs) is arbitrary, so the two sub-clusters position can be flipped within the cluster. This does not change the clustering, only the position of the clusters on the plot.
Clicking on a line (or drawing a bounding box on a line using left mouse button) that represents a sub-cluster branch (or dendrogram leg) will flip the selected leg with the other one leg within the same parent cluster. In this example, clicking on the bottom line will move it to the top of the heat map (Figure 5).
Figure 5. Rows and columns can be flipped by using Flip Mode to select dendrogram legs
The minimum, maximum, and midpoint colors of the heart map intensity plot can be customized.
Select the Heat Map tab
Select Apply
The heat map and plot intensity legend now show maximum values in yellow and minimum values in light blue with a black midpoint (Figure 6). The data range can also be customized by changing the values of Min and Max.
Figure 6. Heat map colors for minimum, maximum, and midpoint intensity can be customized
We can use the hierarchical clustering heat map to examine groups of genes that exhibit similar expression patterns. For example, genes that are up-regulated in Down syndrome samples and down-regulated in normal samples.
Select on the middle cluster of the rows dendrogram as shown (Figure 7) by clicking on the line or drawing a bounding box around the line
The lines within the selected cluster will be bold and the corresponding columns (or rows) on the spreadsheet in the analysis tab will be highlighted.
Figure 7. Selecting a dendrogram cluster using Selection Mode
Right-click anywhere in the viewer
Select Zoom to Fit Selected Rows
The same steps can be used to zoom into columns or rows. Here, we have zoomed in on rows, but not columns to show the expression levels of the selected genes for all samples (Figure 8).
Figure 8. Viewing only selected genes for all samples
Left click anywhere in the hierarchical clustering plot to deselect the dendrogram
Partek Genomics Suite can export a list of genes from any cluster selected, allowing large gene sets to be filtered based on the results of hierarchical clustering.
Select the bottom cluster of the rows dendrogram
Right-click to open the pop-up menu
Select Create Row List... (Figure 9)
Figure 9. Creating gene list from selected cluster
Name the gene set down in normal
Select OK
Save the list as down in normal
In the Analysis tab, there is now a spreadsheet row_list (down in normal.txt) containing the 6 genes that were in the selected cluster. The same steps can be used to create a list of samples from the hierarchical clustering by selecting clusters on the sample dendrogram.
Once you have created a customized plot, you can save the plot properties as a template for future hierarchical clustering analyses.
Select the Save/Load tab
Select Save current...
Name the current plot properties template; we selected Transposed Blue and Yellow
The new template now appears in the Save/Load panel as an option. To load a template, select it in the Load/Save panel and select Load selected. Note that all properties, including Min and Max values and sample groups (based on the column number of the attribute in the source spreadsheet) that may not be appropriate for a different data set, will be applied.
The hierarchical clustering plot can be exported as a publication quality image.
Select the Hierarchical Clustering tab
Select File from the main toolbar
Select Save Image As... from the drop-down menu
Select a destination and name for the file
Select PNG or your preferred image type from the pull-down menu
Select Save
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
For Partek Genomics Suite to recognize an annotation spreadsheet, it must meet several requirements. First, there must be a column header row in the annotation file. Second, there must be a column in the annotation file that matches the identifiers in your data spreadsheet. Third, any text field above the column header row must start with #. Fourth, the text fields must be tab or comma delimited.
We will illustrate associating a spreadsheet with an annotation file using an imported .txt data file from an Illumina HumanHT-12 v4.0 Gene Expression BeadChip array and the HumanHT-12 v4.0 Whole-Genome Manifest File (TXT Format) from Illumina.
Open the annotation file with a text editor such as Notepad++/WordPad/TextEdit (Microsoft Excel is not recommended to edit text files, for instance when used default settings, it converts gene names to dates and floating-point numbers)
Microsoft Excel is not recommended for viewing text files because on default settings it converts some gene names to dates and others to floating-point numbers
Verify that a column in the annotation file matches the identifier in your data spreadsheet, e.g probe ID, the identifier must be unique to each row
Remove the text before the first column header (Figure 1) or add # to each text box
Save the annotation file as a .txt file
Figure 1. The HumanHT-12 v4.0 Gene Expression BeadChip annotation file contains several rows of information prior to the column header row. To use this annotation file in Partek Genomics Suite, we delete any rows prior to the column headers row.
Right-click the spreadsheet you want to annotate in the spreadsheet tree panel, select Properties from the pop-up menu (Figure 2) or select Properties from the File menu on the main toolbar
Figure 2. Changing the spreadsheet properties
Depending on how you imported the data, you may see a Configure Spreadsheet dialog (Figure 3). Select the most appropriate option for your data; here we have chosen Genomic microarray.
Figure 3. The Configure Spreadsheet dialog may appear depending on how you imported your data
The Configure Genomic Properties dialog will now open.
Select the appropriate option for Choose the type of genomic data; here we have chosen Gene Expression (Figure 4).
Figure 4. Selecting the type of genomic data
Select the appropriate options for Location of genomic features in spreadsheet
Selecting Gene Symbol instead of Marker ID allows biological interpretation tasks like GO Enrichment or Pathway Enrichment to be performed without an annotation file because the gene symbol can be used to look up the gene set or pathway database.
Location of genomic features in spreadsheet allows you to specify whether genomic features (e.g. genes, miRNAs, probes, SNPs, CpGs etc) are represented by columns or rows. For Feature in column label, each feature is on a column, each row is a sample. For Feature in column, each feature is on a row and the feature ID for each feature is located in the column chosen with the drop-down menu.
Choose chips/reference and annotation files allows you to specify an annotation file to associate with the spreadsheet.
Select Browse... from Choose chips/references and annotation files
Select your annotation spreadsheet file using the file selection interface
If the genomic position information from the annotation file cannot be automatically parsed, the Configure Annotation dialog will launch. This dialog allows you to choose which columns in the annotation file give the identity and genomic location of the features in your data spreadsheet. There are four options depending on if and how chromosome coordinates are described in the annotation file.
Select the appropriate option for your annotation file; we have selected Chromosome is in one column and the physical position is in another column (eg: chr1, 100 or chr1, 100-200)
The Choose the columns section displays the annotation file spreadsheet with options to choose which columns are the Marker ID,Chromosome, and Physical Position (Figure 4).
Select the column that matches the feature IDs in your data spreadsheet for _Marker ID; w_e have chosen Probe_Id for Marker ID.
Select the column(s) that matches the chromosome location data; we have chosen Chromosome for Chromosome and Probe_Coordinates for Physical Position.
Select Close to return to the Configure Genomic Properties
An index file for the genomic location data of the annotation file is generated in the same folder as the annotation file; it has the same file name as the annotation file, but the file extension .idx. If you need to re-configure the genomic location field in the annotation file, first manually delete the .idx file and re-do the above steps to generate a new index file for the annotation file.
Figure 5. Specifying the columns that contain the genomic locations of markers in the annotation file
The Chip/Reference text field will be populated with the annotation file name. You can edit this text field this if you wish.
For the Annotation column with gene symbols or miRNA names section, if Gene symbol instead of Marker ID is selected, this field is used automatically populated with the gene symbol column; however, if it is not selected, you will need to manually specify the column in the annotation file that corresponds with gene symbols or miRNA names.
Select Set Column:
Select the appropriate column from the dialog; here we have selected ILMN_gene (Figure 5)
Select OK
Figure 6. Choosing the annotation column with gene symbols
Species and gene symbol information is required for biological interpretation analysis.
Select the correct species and genome build from the drop-down menus; we have chosen Homo sapiens and hg19 (Figure 6)
Select OK apply the annotation file to your data spreadsheet
Figure 7. Choosing annotation file using the Configure Genomic Properties dialog
To verify that the annotation has been added, we can try to add annotation information to the spreadsheet when the feature are on rows in the spreadsheet.
Right-click on a column in the annotated data file spreadsheet
Select Insert Annotation from the pop-up menu (Figure 5)
Figure 8. Adding an annotation column to data spreadsheet
The Column Configuration section of the Add Rows/Columns to Spreadsheet dialog should contain all the feature annotations from the annotation file spreadsheet (Figure 6). Here we selected ILMN_Gene, which will add gene name information as a column next to 1. ID_REF.
Figure 9. Annotations from the annotation spreadsheet file should appear as options in the Column Configuration section of the Add Rows/Columns to Spreadsheet
Annotation files for most commercial arrays are available from the chip manufacturer. If you have a custom chip or want to use a customized annotation file, you can create an annotation file that will allow you to add annotations to your features (e.g. probe IDs) when the features are represented by rows on the spreadsheet. Your annotation file must meet the following criteria:
The annotation file must have a column header with a label for each column
A column in the annotation file must correspond to the feature ID column of your data spreadsheet
Any comments before the header must start with # or the header will not be recognized
The fields of the annotation file must be tab or comma delimited
To invoke a genome view of your data, your annotation file must also have one or more columns that contain the genomic location in a format that Partek Genomics Suite can recognize. The annotation file must also contain a column that has the chromosome and base pair location (start and stop or physical position). Cytoband and/or strand can also be included.
The table below provides possible column labels, a description of the format for that field, and an example.
Here are a few examples of the first two rows of annotation files:
Using Agilent format
Using Affymetrix SNPs format
Using Affymetrix exons format
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Preparing a data set for analysis requires importing the data, normalizing the data as appropriate for standard gene expression analysis, and inserting columns containing the experimental variables. Checkout for more details about preparing data. It is not necessary to perform a differential analysis of gene expression before GO ANOVA.
For the sake of example, the following walkthrough will consider an experiment that has been imported which includes two different tissues, brain tissue and heart tissue, extracted from a small set of patients.
The GO ANOVA function is available in the Gene Expression, microRNA Expression, RNA-Seq, and miRNA-Seq workflows.
Select the Gene Expression workflow (or any of the other ones) from the Workflows drop-down on the upper right of the spread sheet
Go to the Biological Interpretation section of the workflow
Select Gene Set Analysis (Figure 1) and then Gene Set ANOVA
Figure 1. GO ANOVA dialog can be invoked via Gene Set Analysis option of the workflow
For this example analysis, the model was kept easy to interpret by including Subject and Tissue as the only ANOVA factors. Additionally, Tissue was added to the Disruption Factor(s). Including Subject controled for person to person variation, and including Tissue allowed the analysis of differential expression and of functional category disruption between tissue types. For the sake of simplicity and minimizing run time, the term Subject was not added to the Disruption Factor(s) box. Including it would have helped correct for subject specific gene expression patterns, though the results were largely unaffected in this case.
Performing GO ANOVA analysis on very large GO categories can take quite a bit of time. More importantly, very large categories may have too large a scope to be useful. To speed the operation and analyze only smaller GO categories, specify 20 genes as the maximum size for an analyzed GO category.
For the sample dataset, the GO ANOVA dialog setup should appear as in Figure 2 below.
Figure 2. GO ANOVA configured for the user guide data set. Two factors added to the model
There are two main visualizations for use with GO ANOVA outputs:
Dot plots used to visualize differential expression of functional groups
Profile plots used for visualizing disruption of gene expression patterns within the group
Dot plots represent each sample with a single dot. The position of each dot is calculated as the average expression of all genes included in the functional group. Invoke this plot by right clicking on the row header of a functional group of interest and choosing Dot Plot (Orig. Data). The color, shape, and size of the dots can be set to represent sample information in the plot properties dialogue, invoked by pressing on the red ball in the upper left.
Figure 1 shows a dot plot for a GO category "cell growth involved in cardiac muscle cell development", which is expressed in the heart at a level of almost four times that of the brain, evidenced by the difference of just under two units on the y-axis (in the current example the values on the y-axis are shown in log2 space). Note that the replicates are grouped neatly, making this category highly significant. That is not a surprise, given that the genes belonging to that category are likely very specific for the heart.
Figure 1. Dot plot of a significantly differentially expressed GO category. Each dot is a sample, box-and-whiskers summarize groups
Profile plots or profiles represent each category of one of the GO ANOVA factors as a few overlapping lines. Horizontal coordinates refer to individual genes or probes in the original data. Vertical coordinates represents expression of the individual gene. Invoke this plot by right clicking on the row header of a function group of interest and choosing Profile (Orig. Data). This plot is useful as the pattern of gene expression in the group is displayed as a line. If the pattern is conserved across treatments, the lines will lie parallel, but if the gene reacts differently, the lines will follow a different pattern, maybe even cross each other.
Profile plot on Figure 2 visualizes a GO category without differential expression, but with significant disruption. Note that the gene TNNI3 is up-regulated in the heart, while STX1A is down-regulated in the heart.
Figure 2. Profile plot of a GO category with significant disruption but not differential expression. Each data point is a gene (error bars are standard error of the mean)
This user guide illustrates:
This user guide assumes the user is familiar with the hierarchy of spreadsheets and analysis in Partek Genomics Suite.
Many plots available in Partek Genomics Suite are not discussed in this user guide. A more thorough review of Partek Genomics Suite visualizations can be found in Chapter 6: The Pattern Visualization System of the Partek User's Manual available from Help > User’s Manual in the Partek Genomics Suite main toolbar.
There is no specific data set for this tutorial. You may use one of your own microarray experiments or use a data set from one of our tutorials.
Visualizations are generated using data from a spreadsheet. Some visualizations allow interactive filtering on the plot, but others do not. If you only wish to include certain rows or columns in a visualization, you may need to create a spreadsheet with only the rows or columns of interest by applying a filter and cloning the spreadsheet.
In general, probe(set)/gene intensity values may be visualized from either an ANOVA spreadsheet or a filtered ANOVA spreadsheet. Because intensity data is stored in the parent spreadsheet, the parent and child spreadsheets should be visible in the spreadsheet navigator with the appropriate parent/child relationship (Figure 1).
Figure 1. Down_Syndrome-GE is the parent spreadsheet; ANOVAResults and A are child spreadsheets of Down_Syndrome-GE
When looking for simple differential expression, sorting by ascending on the factor p-values is ideal. This will find groups that are the most significantly apart across all the contained genes. In the interest of finding groups that are less likely to be called by chance, it may be wise to filter to groups with a minimum of 4 or 5 genes (Figure 1). Simple filters can be done using the interactive filter () available from the button on the toolbar at the top of the screen.
If there is more than one factor in the model, more complex criteria combining the factors can be specified using Tools>List Manager menu Advanced tab. For example, to find categories that are significant and changed by at least two fold, make two criteria: one for a low p-value and the other for a minimum of two fold change, and take the intersection of the two criteria.
Figure 1. Top ten functional groups sorted by the Tissue p-value after filtering to a minimum five gene in the GO category. Note that most of the groups can be directly related to the heart muscle
If the disruption (factor*gene interaction) is tested, the filters can become more complicated. The most pressing need for complex filters is that when analyzing larger functional groups it is not expected that the entire functional group will behave the same. Looking back at Figure 1, notice how the low values in column 7 are present because not every gene is equally differentially expressed even in the most differentially expressed of groups. That is, when there is significant differential expression, it is likely that there will also be disruption as at least a single gene is likely participating in a role beyond that of the functional group and will not follow the pattern of the rest of the group. This situation is expected and leads to a new type of filter.
Filtering for low p-values on the factor and then filtering for low p-values on the factor interacted with gene will find groups that are differentially expressed, but contain at least a few genes that are either disrupted due to treatment, or simply are involved in additional functional groups beyond the scope of the one being analyzed. This list often contains some of the more informative big picture functional groups.
Figure 2. Top ten functional categories sorted by Disruption(Tissue) p-value after filtering to a minimum of five genes in the GO category. By prioritizing by the disruption column this type of a list is more "big picture"
If looking for disruption for groups which are not so much differentially expressed, but instead which express different genes for different treatments, filter for low disruption p-values but for high factor p-values. As shown by Figure 2, large or diverse groups that are differentially expressed will often exhibit significant disruption. In fact, a group that is differentially expressed but includes even a single gene that is not changed will have very significant disruption. These situations are certainly notable, but are distracting if looking for functional groups that instead are uniquely patterned based on treatment. By filtering out those groups with low p-values for the factor and then looking at the remaining groups with low p-values for disruption, groups observed have usually very distinct patterns of expression (Figure 3).
Figure 3. Top ten functional categories sorted by Disruption(Tissue) p-value after filtering to a minimum of five genes in the GO category and minimum Tissue p-value of 0.3. This list is especially interesting, as using enrichment alone to detect such categories would require a lot of labour.
The primary use of the dot plot is visualizing intensity values across samples.
We will invoke a dot plot from a gene list child spreadsheet with genes on rows.
Right-click on the row header of the gene you want to visualize
Select Dot Plot from the pop-up menu (Figure 1)
Figure 1. Creating a dot plot of gene intensity values
A dot plot will be displayed in a new tab (Figure 2).
Figure 2. Simple dot plot of a single gene that shows the distribution of intensities across all samples
There are many customizations that can be made to this simple plot.
Figure 3. Configuring the data shown on the plot
The Configure Plot dialog lets you change how the data is displayed on the plot. We will make a change to illustrate the possibilities.
Set Group by to 4. Tissue using the drop-down menu
This allows us to group the samples by any categorical attribute. These attributes are specified in the parent spreadsheet.
Select OK to modify the plot
We could also have changed the grouping of samples using the Group by drop-down menu above the plot.
The order of the group columns is alphabetical by default, but can be changed to match the spreadsheet order by selecting Categoricals in spreadsheet order in the Configure Plot dialog (Figure 3).
Figure 4. Changing the appearance of a dot plot using the plot properties dialog
The Plot Properties dialog lets you change the appearance of the plot. We will make a few changes to illustrate the possibilities.
Set Shape to 3. Type using the drop-down menu
Select the Box&Whiskers tab
Set Box Width to 15 pixels
Select the Titles tab
Set X-Axis under Configure Axes Titles to Tissue
Select OK to modify the plot
Alternately, we chould have changed the shapes using the Shape by drop-down menu above the plot. The dot plot now shows four columns with thinner box and whisker plots for each and different shapes for different sample types (Figure 5).
Figure 5. The Dot Plot can be modified to optimally visualize your data
Like many visualizations in Partek Genomics Suite, the dot plot is interactive.
Legends can now be dragged and dropped to new locations on the plot. Samples can be selected by left-clicking the sample or left-clicking and dragging a box around samples.
Left-click and drag to move around the plot.
The volcano plot displays p-values and fold-changes of numerous genomic features (e.g., genes or probe sets) at the same time. This allows differentially expressed genes to be quickly identified and saved as a gene list.
Note: the same list can be generated without a visual aid using the List Manager (ANOVA Streamlined tab).
We will invoke a volcano plot from an ANOVA results child spreadsheet with genes on rows.
Select View from the main toolbar
Select Volcano Plot (Figure 1)
Figure 1. Invoking a volcano plot on an ANOVA results spreadsheet
The Volcano Plot Configure dialog will open (Figure 2).
Figure 2. Select the columns to display in the volcano plot
Select the fold-change and p-value columns you would like to visualize from the ANOVA results spreadsheet; here we have chosen 12. Fold-Change(Down Syndrome vs. Normal) for the X Axis and 10. p-value(Down Syndrome vs. Normal) for Y Axis
Select OK
Figure 3. The volcano plot shows each probe(set)/gene as a point. The X Axis shows fold change with no change (N/C) as the mid-point. The Y Axis shows p-values in descending value from a maximum of 1 at the X Axis intersection.
To facilitate analysis, we can add cutoff lines for both fold-change and p-value.
Select the Axes tab
Select Set Cutoff Lines (Figure 4)
Figure 4. Adding cutoff lines to the volcano plot
Set Vertical Line(s) to 1.3 and -1.3
Set Horizontal Line(s) to 0.05
Select Select all points in a section
Select OK (Figure 5)
Figure 5. Setting cutoff lines. The vertical lines are fold-change cutoffs. The horizontal line is a p-value cutoff.
Select OK to close the Plot Rendering Properties dialog
The volcano plot now has cutoff lines for fold-change and p-value (Figure 6).
Figure 6. Cutoff lines facilitate visual analysis of ANOVA results
Because we selected Select all points in a section when adding the cutoff lines, selecting any of the quandrants will select all probe(sets)/genes in that quadrant. If this option is not selected, individual probe(sets)/genes or groups can be selected using selection mode. Gene lists can be generated from selected probe(sets)/genes.
If columns are selected in the ANOVA results source spreadsheet for the volcano plot, only those columns will be included in the created list.
Select the upper right-hand quadrant of the volcano plot
Right click the selected quadrant
Select Create List (Figure 7)
Figure 7. Creating a gene list from a volcano plot
Give the new list a name and description as appropriate
Select OK
The list will be saved as a text file and open as a child spreadsheet in the Analysis tab.
Sort Rows by Prototype is a function that can identify genes with similar expression patterns. For example, if a gene with an interesting expression pattern has been detected, using Sort Rows by Prototype makes it possible to find other genes that have a similar pattern of intensity values. Although this is most commonly used for changes in gene expression over a time course, it can be applied to other experimental designs as well.
To invoke Sort Rows by Prototype_,_ probe(sets)/genes must be on rows. If you want to use this tool to analyze the main intensity values spreadsheet, the spreadsheet must be transposed prior to analysis. A common way to view and analyze gene expression in a time-series experiment is to include means or LS means in the ANOVA spreadsheet.
Configure the ANOVA dialog to include the factor or interaction of interest
Select Advanced... from the ANOVA dialog
Select LS-Mean or Mean
Use the drop down menus to select the factors or interaction you want the LS mean / mean of
Select Add for each
Select OK (Figure 1)
Figure 1. Using Advanced ANOVA setup to include group means in the ANOVA output
Select OK to close the ANOVA configuration dialog and open the ANOVA spreadsheet
The Sort Rows by Prototype function uses every non-text column in a spreadsheet to build and compare patterns; any columns you do not want to include in the pattern similarity analysis need to be removed before running the function.
If you want to preserve the ANOVA spreadsheet contents, clone the ANOVA spreadsheet prior to deleting columns.
Select columns you want to remove
Right-click on a selected column headers
Select Delete from the pop-up menu
We can now invoke Sort Rows by Prototype on the modified spreadsheet.
Select Tools from the main toolbar
Select Discovery
Select Sort Rows by Prototype... (Figure 2)
Figure 2. Invoking Sort Rows by Prototype on spreadsheet with LS mean values for conditions/time points
The Sort Rows by Prototype dialog will launch (Figure 3).
Figure 3. Sort Rows by Prototype dialog
This dialog allows you to configure the pattern, or prototype, that all probe(sets)/genes will be compared to by Sort Rows by Prototype_._
The Select Dissimilarity Measure drop-down menu allows to select from a wide variety of parametric and non-parametric measures of dissimilarity.
After configuring the prototype and selecting a dissimilarity measure, select Sort to run the function
Select Cancel to close the dialog
A new column 1 will be added to the spreadsheet and the rows will be reordered (Figure 4). The new column contains the dissimilarity score for each row; the lower the value, the more similar the row is to the prototype. The row with the highest similarity to the prototype is listed first, with the other rows listed in descending similarity to the prototype.
Figure 4. Result of sorting by prototype. The prototype gene is in the first row, while the other genes are listed based on their similarity to the prototype gene. Smaller proximity values imply more similarity to the selected shape
To view the results, we can generate a profile plot of several of the rows. For example, here we will show the top five most similar probe(sets)/genes.
Select the row headers of the top 5 rows by selecting each while holding the Ctrl key or selecting the first then fifth while holding the Shift key
Select View from the main toolbar
Select Profiles
Select Row Profiles
Select Select for both Plots and X-Axis in the Configure Data Source dialog
The profile plot will open as a new tab (Figure 5).
Figure 5. Profile plot of 5 probe(sets)/genes most similar to the prototype used in Sort rows by prototype
A scatter plot is a simple way to visualize differentially expressed genes. We can plot a scatter plot with gene expression values for two samples at one time. While most probe(sets)/genes fall on a 45° line, up- or down-regulated genes are positioned above or below the line.
To draw a scatter plot, you first need to transpose the original intensities spreadsheet so that the samples are on columns and probe(sets)/genes are on rows.
Select the main spreadsheet
Select Transform from the main toolbar
Select Create Transposed Spreadsheet...
Select the column with sample IDs from the drop-down menu
Select OK
A new temporary spreadsheet will be created with probe(sets)/genes on rows and samples on columns.
Select the two sample columns you would like to compare
Select View from the main toolbar
Select Scatter Plot (Figure 1)
Figure 1. Invoking a scatter plot from a spreadsheet with probe(sets)/genes on rows and samples on columns
Select Yes when asked if you want to only use the selected columns
Select Yes when asked if you are sure you would like to draw the scatter plot
The scatter plot will open in a new tab. We can add a regression line to the plot.
Select Axes
Select Set Regression Lines
Select Regression line of y on x
Set Line Width to 5
Select OK (Figure 2)
Figure 2. Configuring a regression line
Select OK to close the Plot Rendering Properties dialog
The scatter plot now features a regression line dividing the probe(sets)/genes (Figure 3).
Figure 3. Each dot on the plot represents the intensity value of a probe(set)/gene
The MA plot can be used to display a difference in expression patterns between two samples. The horizontal axis (A) shows the average intensity while the vertical axis (M) shows the intensity ratio between the two samples for the same data point. In essence, an MA plot is a scatter plot tilted to the side so that the differentially expressed probe(sets)/genes are located above or below the 0 value of M. An MA plot is also useful to visualize the results of normalization where you would hope to see the median of the values follow a horizontal line.
The MA plot is invoked on the original intensities spreadsheet with any need for transposition.
Select View from the main toolbar
Select MA Plot
The MA plot will launch in a new tab showing the first two rows as the comparison (Figure 4).
Figure 4. MA plot comparing the expression levels between two samples. Each dot on the plot represents a single genomic feature (gene or probe set). The average signal for each genomic feature is shown on the horizontal axis (A), while the ratio is shown on the vertical axis (M).
The samples displayed can be changed using the select sample menus on the left-hand side.
Select () from the Mouse Mode icon set to activate Flip Mode
Set Min color to () using the color picker tool
Set Max color to () using the color picker tool
Select () from the Mouse Mode icon set to activate Selection Mode
To reset zoom select () on the y-axis to show all rows and the x-axis to show all columns.
Select () on the y-axis to show all rows
Select () from the Mouse Mode icon set to activate Selection Mode
Column label | Description of format | Example |
---|---|---|
ProbeID | GeneName | GenomicCoordinates | Cytoband |
---|---|---|---|
Probe Set ID | Chromosome | Physical Position | Strand | Cytoband |
---|---|---|---|---|
probeset_id | seqname | strand | start | stop |
---|---|---|---|---|
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select Configure Plot () from the plot command bar to launch the Configure Plot dialog (Figure 3).
Select Plot Properties () from the plot command bar to launch the Plot Properties dialog (Figure 4)
Select () to activate Selection Mode
Select () to activate Zoom Mode
Left clicking on a region will zoom in on it. The zoom level can be reset by selecting ().
After zooming in, select () to activate Pan Mode
Select () to move between rows on the source spreadsheet
Select () to swap the horizontal and vertical axes
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
The volcano plot will open in a new tab (Figure 3). Control and color options for the volcano plot are largely similar to those described for a . On volcano plots with many probe(sets)/genes, the shapes and sizes of individual probe(sets)/genes will not be visible until they are selected.
Select ()
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () from the main command bar to save the modified spreadsheet
The Pattern Type options () allow preset shapes to be applied to the prototype within the range specified by the Begin, End, Min, and Max parameters. The final option From Row allows you to select any row number in the spreadsheet to serve as the prototype. This is a useful option if you have a particular gene of interest and want to find other genes with similar expression profiles in your data set. You can also manually configure the prototype by dragging the points.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () from the plot command bar
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
chromosome
a chromosome label
3
start
an integer, the start position (in base pairs) of the feature
69871322
stop
an integer, the stop position (in base pairs) of the feature
70100176
genomic_coordinates
chromosome:start-stop
3:69871322-70100176
strand
+ for top, - for bottom
+
physical position
an integer, the position (in base pairs) of the feature
70100176
A_44_P1025812
TC521361
chr12:2546883-2546824
rn
SNP_A-1512540
9
22205296
-
p21.3
2315588
chr1
+
1155398
1155624
The GenomeStudio plug-in lets you export data into a project that can be opened in Partek Genome Suite open directly. It is the fastest and most consistent way to get fully annotated Illumina data into Partek Genomics Suite.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Manhattan plot is a common way to visualize p-values or log-odds ratios for GWAS studies across genomic coordinates.
The starting point for a Manhattan plot is a spreadsheet with SNPs on rows and p-values or log-odds ratios in a column. If beginning with p-values, you will need to convert the p-values to -log10(p-value).
Select the column with p-values
Select Transform form the main toolbar
Select Normalization & Scaling
Select On Columns...
In the Normalization tab, set Base of the Log(x + offset) to 10
Select OK
Go to Transform > Normalization & Scaling > On Columns... again
Select the Add/Mul/Sub/Div tab
Set Multiply by Constant to -1
Select OK
The column now contains -log10(p-value).
We can now invoke the initial plot.
Select View from the main toolbar
Select Genome View
The Genome View tab will open. This plot will need to be configured.
Select the Profiles tab
Remove any unwanted profiles
Select Add profile
Select Column
Select the column with the -log10(p-value) or logs-odds ratio values from the drop-down menu
Select Value for Color by
Select point from the Style drop-down menu
Select OK to add the profile
Select OK to close the Configure Plot Properties dialog
The plot will now show a Manhattan plot (Figure 1).
Figure 1. Customized Genome View showing genomic locations on the x-axis and -log10(P-values) of SNPs on the y-axis (Manhattten plot). Each dot represents a single SNP. The Cytoband is shown along the bottom of the plot
It is also possible to display multiple chromosomes at the same time.
Select Show All in the upper-right hand corner of the plot
This displays all chromosomes vertically. We can display them horizontally for a better view.
Select Genome in line for Layout
Select OK
To further improve the genome-wide view, we can remove the cytoband, remove the genomic position label, color points by chromosome, and increase point size.
Select Cytoband in the upper right-hand corner
Select the Axes tab
Deselect Show Base Pair Labels
Select Profiles
Select Configure
Set Color By to a column with chromosome for each SNP/loci as a category
Set Shape Size to 5.0
Select OK to close the Configure Profile dialog
Select OK to apply changes
The plot will appear as shown (Figure 2).
Figure 2. Full genome Manhattan plot
For details on Genome View see Chapter 6: The Pattern Visualization System in the Partek User's Manual.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Violin plot in Partek Genomics Suite is similar to the Profile Trellis plot in that it displays probe(set)/gene intensity values across samples and genes. However, the Violin plot has additional options not shared by the Profile Trellis plot. Here, we will explore one use case for the Violin plot.
For this example, we will use the data set and lists created in the Gene Expression tutorial. We have a list of 23 genes that are differentially regulated in tissue samples from patients with Down syndrome and normal controls. We want to display the mean intensity values for Down syndrome and normal samples for each of the 23 genes on a single plot. To do this, we first need to filter the probe intensities spreadsheet to include only the intensity values for the 23 genes of interest.
With the probe intensities spreadsheet and the gene list open in the Analysis tab, follow these steps to filter the probe intensities spreadsheet.
Select the probe intensities spreadsheet in the spreadsheet tree; here, it is Down_Syndrome-GE
Select Filter from the main task bar
Select Filter Columns
Select Filter Columns Base on a List... (Figure 1)
Figure 1. Invoking filter columns by a list
The Filter Columns dialog will open (Figure 2).
Figure 2. Configuring the Filter Columns dialog to filter by probe set ID
Select your gene list from the Filter base on spreadsheet drop-down menu; here, we selected Down_Syndrome_vs._Normal
Select the column of your gene list that matches the column IDs you want to filter from your probe intensities spreadsheet; here, we selected 2. Probeset ID
Select OK to apply the filter
A black and yellow horizontal bar will appear at the bottom of the spreadsheet. This is the filter indicator showing the proportion of columns (genes/probesets) filtered out (black) and retained (yellow). To continue working with the filtered probeset intensities, we can clone the filtered spreadsheet.
Right-click on the filtered probe intensities spreadsheet in the spreadsheet tree
Select Clone... from the pop-up menu (Figure 3)
Figure 3. Cloning a spreadsheet with a filter applied will clone only the retained rows/columns
Name the new spreadsheet; we chosen 2
Select OK
The cloned spreadsheet is a temporary file. To ensure we can use it again if we close Partek Genomics Suite, we should save the filtered probe intensities spreadsheet.
Name the new file; we chose Down_Syndrome_vs_Normal_Probe_Intensities
Now we have a spreadsheet containing only the probe intensity values for our 23 genes of interest (Figure 4).
Figure 4. Filtered probe intensities spreadsheet
We can now invoke the Violin plot. Make sure to have the filtered probe intensities spreadsheet selected (in blue) in the spreadsheet tree as shown (Figure 4).
Select View from the main taskbar
Select Violin Plot from the menu
A Violin Plot tab will open (Figure 5). This plot shows the intensity value ranges of the 23 genes (probe sets) for all samples as violin plots.
Figure 5. Viewing violin plots for 23 genes
Select View from the main taskbar
Select Toggle Properties
We can now see the plot properties panel to the left of the violin plot (Figure 6).
Figure 6. The violin plot can be configured using the plot properties panel
Although it is called the Violin plot, this visualization can also be used to display box and whisker plots, error bar plots, and gradiant plots. For this example, we will generate box and whisker plots, summarized by Type (Down syndrome and normal), for each gene.
Select Box and Whisker Plot from the Plot type drop-down menu
Select Type from the Summarize by drop-down menu; this can be any categorical variable
Select Hide legend from Legend Options
Select Apply to modify the plot
The modified plot shows box and whisker plots, Down syndrome samples in red and normal in blue, for each gene (Figure 7).
Figure 7. Viewing average probe intensity values for two groups across 23 genes as box and whisker plots
To improve our view of the gene symbols, we can modify the X-axis legend.
Select X-Axis from the tabs in the plot properties panel
Set Text angle to 90 under Labels
Uncheck Trucate labels under Labels
Uncheck Show Outline under Blocks
Uncheck Columns under Attributes
Select Apply (Figure 8)
Figure 8. Configuring the X-axis label
The gene symbol for each column should now be visilble (Figure 9). In cases where probe intensities for your genes of interest fall across a wide range, it may be helpful to normalize the probe intensity distributions of each gene. This is equivalent to what is done to display a heat map of probe intensity values.
Figure 9. X-axis now labels with gene symbols for each gene
Select the Style tab
Select Standardize - shift column to mean of zero and scale to standard deviation of one from the Normalization options
Select Apply
The box and whisker plots are now centered with a mean of zero and scaled to have a standard deviation of one (Figure 10). Similar to a heat map, this makes it easier to visualize which genes are upregulated and which are downregulated. Here, we can see that most of the 23 genes are expressed more highly in Down symdrome patients.
Figure 10. Viewing normalized box and whisker plots
Plots can also be split by categorical variables. We can use this to visualize differential expression of genes between Down syndrom and normal patients in different tissue types.
Select Configure profile
Select Switch to Advanced (Figure 11)
Figure 11. Simple options for configuring profiles in the plot
Select Sub-Plot for Tissue (Figure 12)
Figure 12. Configuring plot properties to split by Tissue
Select OK
Several options will need to be reconfigured before we apply this change.
Select Standardize - shift column to mean of zero and scale to standard deviation of one from the Normalization section
Select the X-axis tab
Set Text Angle to 90
Deselect Truncate labels
Deselect Show outline
Deselect Columns
Select Apply
There should now be a sub-plot for each category, in this case there are four sub-plots, one for each tissue (Figure 13). There are no error bars for several plots because there are not enough samples in those categories.
Figure 13. Splitting a plot by a categorical factor, Tissue, and grouping by another categorical variable, Type
These sub-plots can be displayed all together, or individually.
Select 1 from the Items/Page drop-down menu
You can now move through the sub-plots by selecting Next >.
Select All from the Items/Page drop-down menu to return to the 2x2 view
This data can also be displayed as a gradient plot (Figure 14) or error bar plot (Figure 15) by changing the Plot type using the drop-down menu in the Style tab. By default, the shading range in the gradiant plot and the error bars show +/-1 standard deviation from the mean.
Figure 14. Gradient plot
Figure 15. Error bar plot
The final option, violin plot, cannot be used to display samples grouped by a categorical variable. To view a violin plot, we must remove the Summarize by selection.
Select (One profile per sub-plot) from the Summarize by drop-down menu
Select Violin plot from the Plot type drop-down menu
Select None - do not adjust values for Normalization
Select Apply
The plot now displays violin plots for each gene showing the distribution of probe intensity values for each tissue in a separate sub-plot (Figure 16).
Figure 16. Violin plots for each gene, sub-plots for each tissue
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The profile plot displays probe(set)/gene intensity values across samples and genes.
We will invoke a profile plot from a gene list child spreadsheet with genes on rows.
Select the rows to be visualized
Right-click on a row header of one of the selected rows
Select Profile Plot (Orig. Data) from the pop-up menu (Figure 1)
Figure 1. Selecting Profile Plot for selected rows
The profile plot will be displayed in a new tab (Figure 2). Lines are probe(sets)/genes and columns are samples from the parent spreadsheet.
Figure 2. Basic profile plot. Each line represents a different prob(set)/gene; each column represents a sample from the parent spreadsheet
A basic profile plot will likely need customization. The plot configuration, properties, and control options are the same as shown for a dot plot. We will illustrate a few modifications here.
We can change the row labels to show each sample ID.
Select the Axes tab
Set Grid to 1
Select Rotate X-Axis Labels and set to 90 degrees (rotates counter-clockwise)
Set Label Format to Column and select 5. Subject
We can add symbols to show which group each sample belongs to.
From the Shape by drop-down menu, select 3.Type
Select OK
Symbols have now been added to each profile line plot (Figure 3).
Figure 3. The profile plot can be modified to facilitate analysis or presentation
Note that samples present on the parent spreadsheet cannot be excluded from the profile plot. To plot only a subset of the samples you must filter the parent spreadsheet.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The XY plot / bar chart displays the intensity of one probe(set)/gene across two categorical variables. Only one probe(set)/gene may be visualized at a time.
We will invoke an XY plot from a gene list child spreadsheet with genes on rows. The parent spreadsheet should include the categorical variables you want to chart.
Right-click on the row header of the gene you want to visualize
Select XY Plot (Orig. Data) from the pop-up menu (Figure 1)
Figure 1. Invoking an XY Plot from a gene list child spreadsheet
An XY plot will be displayed in a new tab (Figure 2).
Figure 2. By default, an XY plot invoked from a gene list will have the first categorical variable as columns and the second categorical variable as shapes/colors
To display the change in gene expression over time for each treatment condition, we need to modify this plot.
Set X-Axis to 3. Time using the drop-down menu
Set Separate by to 2. Treatment using the drop-down menu
Select OK
To help visualize the connection between time points, we can add connecting lines.
Set Plot Style to lines using the drop-down menu
Select OK
The plot now shows time on the x-axis, plots treatments, and connects treatments across time points with lines (Figure 3). Each point is the LS mean value of all samples with the same values for the two selected categorical variables. The error bars are standard error.
Figure 3. Modifying the XY plot to enable analysis of gene expression changes in a treatment condition over a time course. In this experiment, only the control was measured at time 0.
While most of the plot controls are shared with the dot plot, XY plot does have a few unique options.
This feature is useful when performing visual analysis of patterns in gene expression changes in a list of genes.
It is also possible to invoke an XY plot from the parent spreadsheet using the main toolbar.
Select the parent spreadsheet in the spreadsheet tree
Select View from the main toolbar
Select XY Plot / Bar Chart ...
The Create XY Plot / Barchart dialog will open (Figure 4).
Figure 4. Invoking an XY Plot from the main toolbar
An XY plot will be displayed in a new tab (Figure 5).
Figure 5. The gene name associated with the probe(set) column is displayed as the chart title by default
To switch this plot from to one of the gene lists we have created, we can use the drop-down menu next to the previous/next controls.
The displayed by a XY plot can instead be displayed as a bar chart with overlayed bars, vertically stacked bars, or horizontally stacked bars. A bar chart can be directly invoked or an XY plot can be converted into a bar chart (and vice versa).
Invoke the plot from a gene list using the Bar Chart (Orig. Data) option in the pop-up menu (Figure 1)
Invoke the plot from the main toolbar by selecting one of the bar chart options in the Line Style drop-down menu (Figure 4)
Figure 6. An XY Plot can be converted to a Barchart using the Plot Rendering Properties dialog
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This user guide describes how to export gene expression data using Partek's Report Plug-in for Illumina GenomeStudio Gene Expression Module for use in Partek Genome Suite. The GenomeStudio plug-in lets you export data into a project that can be directly opened in Partek Genomics Suite. It is the fastest and most consistent way to get fully annotated Illumina gene expression data into Partek Genomics Suite.
Download the plug-in zip file
unzip the file, there is a folder called PartekReport which contains two .dll files --Partek.Common.dll and Partek.GeneExpression.GenomeStudio.dll, move the PartekReport folder to
C:\Program Files (x86)\Illumina\GenomeStudio\Modules\BSGX\ReportPlugins, if there is no ReportPlugins folder in BSGX folder, create one, the path and folder names have to be exactly match one described above (Figure 1).
Figure 1. Place PartekReport folder in the appropriate direcotry
In GenomeStudio gene expression project:
Choose Analysis > Reports... from the main menu
Select Custom Report and choose Partek Report Plug-in from the drop-down list
Specify AnnotationName, do NOT include <> in the name, you can the same name as the .bgx file you imported the data with, or a unique name to your dataset
Choose Type by clicking on the cell, default is gene level
Leave all the others as default value (Figure 2)
Specify the report file name, we recommend to put the exported files in their own folder, which allows you to move the folder instead of all the files individually.
Click OK
Figure 2. Configuring the GenomeStudio gene expression report dialog
There are five files exported, including a project file (.ppj), which can be opened directory in Partek Genomic Suite. The project file opens the signal intensities data in a spreadsheet and associates the annotation information to the intensity spreadsheet. All intensities are log2 transformed. If there are negative values in the AVG_Signal, the data will be shifted to the lowest value one and then log2 transformed.
To open the report, launch Partek Genomics Suite, choose File > Open Project, browse to the .ppj file to open. In the Gene Expression workflow, you can proceed add sample attribute step.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This user guide describes how to export copy number and genotype data using Partek's Report Plug-in for Illumina GenomeStudio Genotype Module for use in Partek Genome Suite. The GenomeStudio plug-in lets you export data into a project that can be opened in Partek Genome Suite open directly. It is the fastest and most consistent way to get fully annotated Illumina gene expression data into Partek.
Download the plug-in zip file
unzip the file, there is a folder called PartekReport which contains two .dll files --Partek.Common.dll and Partek.GeneExpression.GenomeStudio.dll, move the PartekReport folder to
C:\Program Files \Illumina\GenomeStudio 2.0\Modules\BSGT\ReportPlugins, if there is no ReportPlugins folder in BSGT folder, create one, the path and folder names have to be exactly match one described above (Figure 1).
In GenomeStudio genotype project:
Choose Analysis > Reports>Report Wizard from the main menu
Select Custom Report and choose Partek Report Plug-in from the drop-down list
Specify AnnotationName, do NOT include <> in the name, you can the same name as the .bgx file you imported the ddata with, or a unique name to your dataset
Figure 1. Configuring the GenomeStudio copy number report dialog
Leave all the others as default value (Figure 2) click Next
Specify the report file name, we recoommend to put the exported files in their own folder, which allows you to move thefolder instead of all the files individually.
Click Finish (Figure 2)
Figure 2. Specify output folder and file name
The output generate 9 files in the folder including a project file (.ppj), annotation file, summary file and 3 sets of Partek spreadshet file-- each spreadsheet consists of 2 files.
To open the report, launch Partek Genomics Suite, choose File > Open Project, browse to the .ppj file to open. There will be three spreadsheets opened (Figure 3)
Figure 3. Open project in Partek Genomics Suite
Spreadsheet 1 contains genotype calls, spreadsheet 2 contains log R ratio which is copy number in log scale, spreadsheet 3 contains B allele frequency.
To do copy number analysis, select spreadsheet 2 log R ratio, choose Copy number workflow, start from QA/QC section. Genotype spreadsheet will be used for Association and LOH workflow.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
GO ANOVA output is very similar to standard ANOVA output except each row in the resulting sheet contains statistical results from a single GO functional group rather than a single gene. Columns can be broken down into four sections:
Annotations contain detail about the category being considered
ANOVA results contain the significance of the effect of the factors in the model
Contrast results contain significance and fold change of the difference between groups compared via contrast
F-ratios display the significance of the factors in the ANOVA model
Annotations will take up the first four columns of the results sheet (Figure 1). The first column (# of genes) is the number of genes in the GO category. Specifically, this is not necessarily the number of unique genes in the category; depending on the technology, it can be the number of probes or probe sets on the microarray whose targets fall into the GO category. Genes targeted more than once will be counted more than once. The second column (GO ID) is the unique numeric identifier of the GO category; it is sometimes useful for searching with when the GO category has a very long name. The third column is the type of the GO category, while the fourth column (GO Description) is the name of the GO category.
Figure 1. GO ANOVA annotation columns (example)
When right click on any row header to choose Create Gene List , a new spreadsheet will be generated, it contains a list of genes (probes/probesets) within the selected GO category.
ANOVA results will include a column for each factor in the setup (Figure 2). A column with the name of the factor or interaction followed by p-value will contain how significant the effect of the variable is on the data. A lower p-value corresponds with a more significant effect. For example, a p-value of 0.1 for tissue means that given the difference between the tissue and the inherent variability of the measurements of the genes in the functional group, there is a 10% likelihood that the tissues are equivalent. A p-value of 0 occurs when the value is too small to be displayed. This can be caused by a very low estimate of inherent variability due to either a very small number of replicates or severely unbalanced data.
Figure 2. Viewing the GO ANOVA result
In the example experiment, a low p-value for tissue would imply the functional group is differentially expressed across tissues.
A low p-value for an interaction implies that the effect of one factor on the other is significant. In the example dataset, no interactions between two main variables were included as factors. To illustrate what the interaction p-value would mean, consider the case that a drug compound and a control injection were dosed over several time points and an interaction between injection compound and time point was included in the GO ANOVA. A low p-value for the drug-time point interaction corresponds to the effect of drug on the functional group being altered with time.
A column will also be present for each factor placed in the Disruption Factor(s) box. This column will have the header Disruption(Factor name). A low p-value in this column corresponds to the different states presenting with different gene patterns within the functional group. For functional groups containing only a single gene, no value will be present as the pattern cannot change. In the example experiment, a low p-value for the Disruption(Tissue) represents function categories which have different genes operating in the heart and in the brain.
Contrast results include four columns for each of the comparisons declared during GO ANOVA setup. The first column contains the p-value representing the significance of the difference between the two categories. The second column contains the ratio between the two groups where increases are represented as greater than one and decreases are represented as values between zero and one. The third column is the fold change of the functional group between the two categories where increases are greater than one and decreases are less than negative one. The fourth column contains a plain text description of the direction of the fold change. Fold changes and ratios represent the average change in the functional category. In the example, a contrast was run comparing expression in the cerebral tissue to the heart tissue (Figure 3). As these were the only tissues, the p-values are identical to those in column 5. While the p-value column shows which groups are differentially expressed between the tissues, the fold change columns allow us to see by how much they are differentially expressed. Using the sign of the fold change, or the description column, you can see which categories are increased in brain and which are increased in heart.
Figure 3. Viewing the GO ANOVA contrast columns
F-Ratios
F-ratios (Figure 4) are used in the computation of p-values. The values in the columns can safely be ignored by most users; there are exceptional cases when the F-ratios may be informative. To see the general significance of the factors included in the model, a Sources of Variation plot can be computed from these values from the View menu (or the Workflow). The higher the average F-ratio, the more important the factor is to the model on average.
Figure 4. Viewing the GO ANOVA F-ratios
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
Select () from the plot command bar
Select to open the Configure Plot dialog
Select
Select ()
Select ()
Select () from the plot command bar
Select () from the plot command bar
Select () to automatically cycle through each row (gene) in the source spreadsheet
Select () to stop the cycling
The drop-down menu adjacent to the previous/next () controls lets you switch source spreadsheets.
Lines, but not points, can be selected when using Selection Mode ().
Selecting previous/next () will nagivate along either rows or columns, whichever has probe(set)/gene information.
Invoke the plot as an XY plot, select (), then select one of the bar chart options from the Plot Style drop-down menu in the Plot Rendering Properties dialog (Figure 6)