1 of 40

User Manual

Partek Genomics Suite is a comprehensive suite of advanced statistics and interactive data visualization specifically designed to reliably extract biological signals from noise. Designed for high-dimensional genomic studies containing thousands of samples, Partek Genomics Suite is fast, memory efficient and will analyze large data sets on a personal computer. It supports a complete workflow including convenient data access tools, identification and annotation of important biomarkers, and construction and validation of predictive diagnostic classification systems.

Lists

Scientists often develop lists of genes, probes, transcripts, SNPs, and genomic regions of interest from analysis tools, research papers, and databases. Using Partek Genomics Suite, these lists can be integrated with genomics data sets, analyzed with powerful statistics, and visualized for new insights.

This user guide will illustrate:

Importing a text file list
Adding annotations to a gene list

This user guide does not discuss every operation that can be performed on an imported list of regions, SNPs, or genes. If there is some other feature in Partek Genomics Suite that you would like to apply to an imported list, please contact the technical support team for additional guidance. If you have found a novel use of a Partek Genomics Suite feature on an imported list that you think should be included in this user guide, please let us know.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Importing a text file list

The preferred method for importing a generic data spreadsheet into Partek Genomics Suite is as a text file. Here, we illustrate importing a list of genes with p-value and fold-change from an experiment comparing two conditions.

Select File from the main toolbar
Select Import
Select Text (.csv .txt)...
Select the text file using the file browser to launch the Import .txt, .tsv, or .csv File dialog

The File Type section of the Import dialog includes a preview of the text file and import options (Figure 1).

The columns in the import file can be separate by a tab, comma, or any other character.

For most applications, the items on the list should be in rows while attributes or values should be in columns. If a list is oriented with items on columns, select Transpose the file to to import a transposed spreadsheet.

Select Next > to move to the Data Type section
Select your data type; here we have chosen Genomic Data because it is a gene list (Figure 2)

We have also deselected Is the data log transformed (LOG_base (x+offset) ) ?

Selecting Genomic Data will open a dialog after import to configure properties for the imported list including selecting the type of genomic data, the location of genomic features in the spreadsheet, the annotation column with gene symbols, the chip or reference source and annotation file, the species, and reference genome build.

Select Next >

The next step is to identify where the data starts and where the optional header is found using Identify Column Labels, Start of Data (Figure 3). The line that contains the header (if present) must precede the data. If there are lines to be skipped in the file (like comments), they may only appear at the top of the file, before the header line or data begin.

If there are many comment lines at the start of the file, you may need to select View Next 5 Records to get to the row that contains the column header. If you accidentally move past the screen that contains the header or data rows, select View Previous 5 Records.

If there are missing numerical values or empty cells in your input list, insert a special character or symbol (?, N/A, NA, etc.) in the missing cells; you will specify the character in the Missing Data Representation section of the dialog, only one symbol can be used to represent missing values, the default missing value indicator is ?.

If a header row is present, select Col Lbls to allow you to select a column header row
Select the row where the data beings using the Begin Data selector
If any cells have a missing value, you can signify this with a special symbol selected using the Missing Data Representation panel

The Preview text encoding section (Figure 4) previews the first five lines of the file, allowing you to check if the text encoding is correct.

If the text does not appear properly, use the Specify the text encoding: drop-down menu to choose the correct encoding

Select Next >

The final section of the Import .txt, .tsv, or .csv File dialog is Verify Type & Attribute of Data Columns (Figure 5). While data column type and attribute can be modified after import, it is easier and faster to select the proper options during import as multiple columns may be selected during this dialog.

Check and modify column types and attributes

If there is an identifier like gene symbol or SNP, the Type field for that column should be set to text and Attribute should be set to label. Numeric values (intensities, p-values, fold-changes, etc.) should have Type set to double and Attribute set to response. The other possible value for Attribute is factor and describes sample data. The user interface is this dialog allows you to select multiple columns at once (Ctrl+left click and Shift+left click). The interface controls are detailed in the dialog (Figure 5).

Select Finish to import the text file and open it as a spreadsheet

If Genomic data was selected in the Data Type section, the Configure Genomic Properties dialog will open (Figure 6). These options will be discussed in the next section when we add an annotation file.

Select OK

The imported spreadsheet will open (Figure 7).

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Tasks available for a gene list

Starting with a list of SNPs

A list of SNPs using dbSNP IDs can be imported as a text file and associated with an annotation file as described for a list of genes. The annotation file you use to annotate the SNPs should minimally contain the chromosome number and physical position of each locus.

Novel SNPs or SNPs that are not found in your annotation source must be imported as a region list. For this, follow the procedure outlined in , but use the SNP name in place of a region name.

Annotating SNPs with genes

Starting with a list of SNPs that have been associated with genomic loci using an annotation file and assigned a species with genome build, you can use Find Overlapping Genes

Gene Ontology ANOVA

With gene ontology (GO) ANOVA, Partek Genomics Suite includes the ability to use rigorous statistical analysis to find differentially expressed functional groupings of genes. Leveraging the Gene Ontology database, Partek Genomics Suite can organize genes into functional groups. Not only can GO ANOVA detect up and down regulated functional groups, but also functional groups, which are disrupted in a few genes as a result of treatment. Moreover, the common diction of the GO effort enables this analysis to be compared across all types of gene expression data, including those from other species. Traditional tests, such as GO enrichment, require defining filtered lists of differentially expressed genes followed by an analysis of functional groups related to those genes. On the other hand, GO ANOVA is performed directly after data import and normalization. This minimizes the risk that a highly stringent filter will cause important functional groups to be overlooked.

Other tests, such as gene set enrichment analysis (GSEA), tolerate minimal or no pre-filtering. However, these tests are very limited in their ability to integrate complicated experimental designs. GSEA, for example, can only handle two groups at a time. GO ANOVA, on the other hand, can leverage the wealth of sample information collected and use powerful multi-factor ANOVA statistics to analyze very complex interactions and regulatory events. The analysis output includes detailed statistical results specifying the effect and importance of phenotypic information on differential expression and subsequent disruption of Gene Ontology functional categories. Furthermore, GSEA calculates enrichment scores using a running-sum statistic on a ranked gene list. GO ANOVA takes into account more information by utilizing each sample’s expression values to calculate the enrichment score.

Note that the same principles apply to Pathway ANOVA, the only difference being the mapping file; GO ANOVA organizes genes into GO categories, while Pathway ANOVA looks at pathways.

Implementation Details

The method used to detect changes in functional groups is ANOVA. For detailed information about ANOVA, see Chapter 11 of the Partek User Manual. There is one result per functional group based on the expression of all the genes contained in the group. Besides all the factors specified in the ANOVA model, the following extra terms will be added to the model by Partek Genomics Suite automatically:

Gene ID - Since not all genes in a functional group express at the same level, gene ID is added to the model to account for gene-to-gene differences
Factor * Gene ID (optional) - Interaction of gene ID with the factor can be added to detect changes within the expression of a GO category with respect to different levels of the factor, referred to in this document as the disruption of the categories expression pattern or simply disruption

Suppose there is an experiment to find genes differentially expressed in two tissues: Two different tissues are taken from each patient and a paired sample t-test, or 2-way ANOVA can be used to analyze the data. The GO ANOVA dialog allows you to specify the ANOVA model, which includes the two factors: tissue and participant ID. The analysis is performed at the gene level, but the result is displayed at the level of the functional group by averaging of the member genes’ results. The equation of the model that can be specified is:

y = µ + T + P + ε

y: expression of a functional group
µ: average expression of the functional group
T: tissue-to-tissue effect

When the tissue is interacted with the gene ID then the ANOVA model becomes more complicated as demonstrated in the model below. The functional group result is not explicitly derived by averaging the member genes as the new model includes terms for both gene and group level results:

y = µ + T + P + G + T *G + ε

y: expression of a functional group
µ: average expression of the functional group
T: tissue-to-tissue effect

In the case that there is more than one data column mapping to the same gene symbol, Partek Genomics Suite will assume that the markers target different isoforms and will not treat the two markers as replicated of the same gene. Instead, each column is treated as a gene unto itself.

If there are only two samples in the spreadsheet then, Partek Genomics Suite cannot calculate a type by gene ID interaction. In this case, the result spreadsheet will contain a column labeled Disruption score. First, for each gene in the functional group Partek Genomics Suite will calculate the difference between the two samples. A z-test is used to compare the difference between each gene and the rest of the genes in the functional group. The disruption score is the minimum p-value from the z-tests comparing each gene to the rest in the functional group. A low disruption score therefore indicates that at least one gene behaves differently from the rest. This implies a change in the pattern of gene expression within the functional group and potential disruption of the normal operation of the group. The category as a whole may or may not exhibit differential expression in addition to the disruption.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Configuring the GO ANOVA Dialog

The setup dialog for GO ANOVA can be found in the Biological Interpretation section of the expression workflows (Gene Expression, MicroRNA Expression, Exon, RNA-Seq, miRNA-Seq). It is recommended that GO ANOVA is run on the sheet with expression levels, after import and normalization, though GO ANOVA can be run on any spreadsheet with samples on rows and genes on columns. If a child spreadsheet is selected, such as the result of a prior ANOVA analysis, then the test will be automatically run on the parent spreadsheet.

Upon selecting GO ANOVA (Biological Interpretation > Gene Set Analysis), Partek Genomics Suite will first offer the opportunity to configure the parameters of the test and exclude functional groups with too few or too many genes (Figure 1). To save time when running GO ANOVA, the size of GO categories analyzed can be limited using the Restrict analysis to function groups with fewer than __ genes. Large GO categories may be less interesting and also take the most time to analyze. We recommend to restrict the analysis to the groups with fewer than 150 genes, as it can make the analysis much quicker (and the results easier to interpret). In the current example, the maximum category was set to only 20 genes, for demonstration purposes only.

Figure 1. Configure the parameters of the test: gene ontology categories with too few or too many genes can be excluded

Performing GO ANOVA

Preparing a data set for analysis requires importing the data, normalizing the data as appropriate for standard gene expression analysis, and inserting columns containing the experimental variables. Checkout for more details about preparing data. It is not necessary to perform a differential analysis of gene expression before GO ANOVA.

For the sake of example, the following walkthrough will consider an experiment that has been imported which includes two different tissues, brain tissue and heart tissue, extracted from a small set of patients.

The GO ANOVA function is available in the Gene Expression, microRNA Expression, RNA-Seq, and miRNA-Seq workflows.

Select the Gene Expression

GO ANOVA Visualisations

There are two main visualizations for use with GO ANOVA outputs:

Dot plots used to visualize differential expression of functional groups
Profile plots used for visualizing disruption of gene expression patterns within the group

Dot Plots

Dot plots represent each sample with a single dot. The position of each dot is calculated as the average expression of all genes included in the functional group. Invoke this plot by right clicking on the row header of a functional group of interest and choosing Dot Plot (Orig. Data). The color, shape, and size of the dots can be set to represent sample information in the plot properties dialogue, invoked by pressing on the red ball in the upper left.

Figure 1 shows a dot plot for a GO category "cell growth involved in cardiac muscle cell development", which is expressed in the heart at a level of almost four times that of the brain, evidenced by the difference of just under two units on the y-axis (in the current example the values on the y-axis are shown in log2 space). Note that the replicates are grouped neatly, making this category highly significant. That is not a surprise, given that the genes belonging to that category are likely very specific for the heart.

Figure 1. Dot plot of a significantly differentially expressed GO category. Each dot is a sample, box-and-whiskers summarize groups

Profile Plots

Profile plots or profiles represent each category of one of the GO ANOVA factors as a few overlapping lines. Horizontal coordinates refer to individual genes or probes in the original data. Vertical coordinates represents expression of the individual gene. Invoke this plot by right clicking on the row header of a function group of interest and choosing Profile (Orig. Data). This plot is useful as the pattern of gene expression in the group is displayed as a line. If the pattern is conserved across treatments, the lines will lie parallel, but if the gene reacts differently, the lines will follow a different pattern, maybe even cross each other.

Profile plot on Figure 2 visualizes a GO category without differential expression, but with significant disruption. Note that the gene TNNI3 is up-regulated in the heart, while STX1A is down-regulated in the heart.

Figure 2. Profile plot of a GO category with significant disruption but not differential expression. Each data point is a gene (error bars are standard error of the mean)

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Recommended Filters

When looking for simple differential expression, sorting by ascending on the factor p-values is ideal. This will find groups that are the most significantly apart across all the contained genes. In the interest of finding groups that are less likely to be called by chance, it may be wise to filter to groups with a minimum of 4 or 5 genes (Figure 1). Simple filters can be done using the interactive filter () available from the button on the toolbar at the top of the screen.

If there is more than one factor in the model, more complex criteria combining the factors can be specified using Tools>List Manager menu Advanced tab. For example, to find categories that are significant and changed by at least two fold, make two criteria: one for a low p-value and the other for a minimum of two fold change, and take the intersection of the two criteria.

Figure 1. Top ten functional groups sorted by the Tissue p-value after filtering to a minimum five gene in the GO category. Note that most of the groups can be directly related to the heart muscle

If the disruption (factor*gene interaction) is tested, the filters can become more complicated. The most pressing need for complex filters is that when analyzing larger functional groups it is not expected that the entire functional group will behave the same. Looking back at Figure 1, notice how the low values in column 7 are present because not every gene is equally differentially expressed even in the most differentially expressed of groups. That is, when there is significant differential expression, it is likely that there will also be disruption as at least a single gene is likely participating in a role beyond that of the functional group and will not follow the pattern of the rest of the group. This situation is expected and leads to a new type of filter.

Filtering for low p-values on the factor and then filtering for low p-values on the factor interacted with gene will find groups that are differentially expressed, but contain at least a few genes that are either disrupted due to treatment, or simply are involved in additional functional groups beyond the scope of the one being analyzed. This list often contains some of the more informative big picture functional groups.

Figure 2. Top ten functional categories sorted by Disruption(Tissue) p-value after filtering to a minimum of five genes in the GO category. By prioritizing by the disruption column this type of a list is more "big picture"

If looking for disruption for groups which are not so much differentially expressed, but instead which express different genes for different treatments, filter for low disruption p-values but for high factor p-values. As shown by Figure 2, large or diverse groups that are differentially expressed will often exhibit significant disruption. In fact, a group that is differentially expressed but includes even a single gene that is not changed will have very significant disruption. These situations are certainly notable, but are distracting if looking for functional groups that instead are uniquely patterned based on treatment. By filtering out those groups with low p-values for the factor and then looking at the remaining groups with low p-values for disruption, groups observed have usually very distinct patterns of expression (Figure 3).

Figure 3. Top ten functional categories sorted by Disruption(Tissue) p-value after filtering to a minimum of five genes in the GO category and minimum Tissue p-value of 0.3. This list is especially interesting, as using enrichment alone to detect such categories would require a lot of labour.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Visualizations

This user guide illustrates:

Dot Plot

The primary use of the dot plot is visualizing intensity values across samples.

We will invoke a dot plot from a gene list child spreadsheet with genes on rows.

Right-click on the row header of the gene you want to visualize
Select Dot Plot from the pop-up menu (Figure 1)

Profile Plot

The profile plot displays probe(set)/gene intensity values across samples and genes.

We will invoke a profile plot from a gene list child spreadsheet with genes on rows.

Select the rows to be visualized
Right-click on a row header of one of the selected rows

Volcano Plot

The volcano plot displays p-values and fold-changes of numerous genomic features (e.g., genes or probe sets) at the same time. This allows differentially expressed genes to be quickly identified and saved as a gene list.

Note: the same list can be generated without a visual aid using the List Manager (ANOVA Streamlined tab).

We will invoke a volcano plot from an ANOVA results child spreadsheet with genes on rows.

Select View from the main toolbar

Visualizing NGS Data

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

823KB

Visualizations of Next Generation Sequencing Data.pdf

PDF

Open

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Chromosome View

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Methylation Workflows

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Trio/Duo Analysis

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

689KB

Using the Trio Workflow.pdf

PDF

Open

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Association Analysis

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

1MB

Using the Association Workflow.pdf

PDF

Open

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

LOH detection with an allele ratio spreadsheet

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

110KB

AlleleRatioLOHDocumentation.pdf

PDF

Open

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Import data from Agilent feature extraction software

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

193KB

ImportingAgilentDataintoPartek.pdf

PDF

Open

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Illumina GenomeStudio Plugin

The GenomeStudio plug-in lets you export data into a project that can be opened in Partek Genome Suite open directly. It is the fastest and most consistent way to get fully annotated Illumina data into Partek Genomics Suite.

Import gene expression data
Import Genotype Data

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Import gene expression data

This user guide describes how to export gene expression data using Partek's Report Plug-in for Illumina GenomeStudio Gene Expression Module for use in Partek Genome Suite. The GenomeStudio plug-in lets you export data into a project that can be directly opened in Partek Genomics Suite. It is the fastest and most consistent way to get fully annotated Illumina gene expression data into Partek Genomics Suite.

Partek Gene Expression plug-in installation

Download the plug-in zip file

55KB

PartekReportGX.zip

Export report from GenomeStudio

In GenomeStudio gene expression project:

Choose Analysis > Reports... from the main menu
Select Custom Report and choose Partek Report Plug-in from the drop-down list
Specify AnnotationName, do NOT include <> in the name, you can the same name as the .bgx file you imported the data with, or a unique name to your dataset

Figure 2. Configuring the GenomeStudio gene expression report dialog

There are five files exported, including a project file (.ppj), which can be opened directory in Partek Genomic Suite. The project file opens the signal intensities data in a spreadsheet and associates the annotation information to the intensity spreadsheet. All intensities are log2 transformed. If there are negative values in the AVG_Signal, the data will be shifted to the lowest value one and then log2 transformed.

Open project in Partek Genomics Suite

To open the report, launch Partek Genomics Suite, choose File > Open Project, browse to the .ppj file to open. In the Gene Expression workflow, you can proceed add sample attribute step.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Export CNV data to Illumina GenomeStudio using Partek report plug-in

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Import data from Illumina GenomeStudio using Partek plug-in

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

187KB

GenomeStudioGenotypePlugin.pdf

PDF

Open

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Export methylation data to Illumina GenomeStudio using Partek report plug-in

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

221KB

GenomeStudioMethylationPlugin.pdf

PDF

Open

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Violin Plot

The Violin plot in Partek Genomics Suite is similar to the Profile Trellis plot in that it displays probe(set)/gene intensity values across samples and genes. However, the Violin plot has additional options not shared by the Profile Trellis plot. Here, we will explore one use case for the Violin plot.

Displaying intensity value ranges for multiple genes grouped by categorical variables

For this example, we will use the data set and lists created in the Gene Expression tutorial. We have a list of 23 genes that are differentially regulated in tissue samples from patients with Down syndrome and normal controls. We want to display the mean intensity values for Down syndrome and normal samples for each of the 23 genes on a single plot. To do this, we first need to filter the probe intensities spreadsheet to include only the intensity values for the 23 genes of interest.

With the probe intensities spreadsheet and the gene list open in the Analysis tab, follow these steps to filter the probe intensities spreadsheet.

Select the probe intensities spreadsheet in the spreadsheet tree; here, it is Down_Syndrome-GE
Select Filter from the main task bar
Select Filter Columns

Figure 1. Invoking filter columns by a list

The Filter Columns dialog will open (Figure 2).

Figure 2. Configuring the Filter Columns dialog to filter by probe set ID

Select your gene list from the Filter base on spreadsheet drop-down menu; here, we selected Down_Syndrome_vs._Normal
Select the column of your gene list that matches the column IDs you want to filter from your probe intensities spreadsheet; here, we selected 2. Probeset ID
Select OK to apply the filter

A black and yellow horizontal bar will appear at the bottom of the spreadsheet. This is the filter indicator showing the proportion of columns (genes/probesets) filtered out (black) and retained (yellow). To continue working with the filtered probeset intensities, we can clone the filtered spreadsheet.

Right-click on the filtered probe intensities spreadsheet in the spreadsheet tree
Select Clone... from the pop-up menu (Figure 3)

Figure 3. Cloning a spreadsheet with a filter applied will clone only the retained rows/columns

Name the new spreadsheet; we chosen 2
Select OK

The cloned spreadsheet is a temporary file. To ensure we can use it again if we close Partek Genomics Suite, we should save the filtered probe intensities spreadsheet.

Select ()
Name the new file; we chose Down_Syndrome_vs_Normal_Probe_Intensities

Now we have a spreadsheet containing only the probe intensity values for our 23 genes of interest (Figure 4).

Figure 4. Filtered probe intensities spreadsheet

We can now invoke the Violin plot. Make sure to have the filtered probe intensities spreadsheet selected (in blue) in the spreadsheet tree as shown (Figure 4).

Select View from the main taskbar
Select Violin Plot from the menu

A Violin Plot tab will open (Figure 5). This plot shows the intensity value ranges of the 23 genes (probe sets) for all samples as violin plots.

Figure 5. Viewing violin plots for 23 genes

Select View from the main taskbar
Select Toggle Properties

We can now see the plot properties panel to the left of the violin plot (Figure 6).

Figure 6. The violin plot can be configured using the plot properties panel

Although it is called the Violin plot, this visualization can also be used to display box and whisker plots, error bar plots, and gradiant plots. For this example, we will generate box and whisker plots, summarized by Type (Down syndrome and normal), for each gene.

Select Box and Whisker Plot from the Plot type drop-down menu
Select Type from the Summarize by drop-down menu; this can be any categorical variable
Select Hide legend from Legend Options

The modified plot shows box and whisker plots, Down syndrome samples in red and normal in blue, for each gene (Figure 7).

Figure 7. Viewing average probe intensity values for two groups across 23 genes as box and whisker plots

To improve our view of the gene symbols, we can modify the X-axis legend.

Select X-Axis from the tabs in the plot properties panel
Set Text angle to 90 under Labels
Uncheck Trucate labels under Labels

Figure 8. Configuring the X-axis label

The gene symbol for each column should now be visilble (Figure 9). In cases where probe intensities for your genes of interest fall across a wide range, it may be helpful to normalize the probe intensity distributions of each gene. This is equivalent to what is done to display a heat map of probe intensity values.

Figure 9. X-axis now labels with gene symbols for each gene

Select the Style tab
Select Standardize - shift column to mean of zero and scale to standard deviation of one from the Normalization options
Select Apply

The box and whisker plots are now centered with a mean of zero and scaled to have a standard deviation of one (Figure 10). Similar to a heat map, this makes it easier to visualize which genes are upregulated and which are downregulated. Here, we can see that most of the 23 genes are expressed more highly in Down symdrome patients.

Figure 10. Viewing normalized box and whisker plots

Plots can also be split by categorical variables. We can use this to visualize differential expression of genes between Down syndrom and normal patients in different tissue types.

Select Configure profile
Select Switch to Advanced (Figure 11)

Figure 11. Simple options for configuring profiles in the plot

Select Sub-Plot for Tissue (Figure 12)

Figure 12. Configuring plot properties to split by Tissue

Select OK

Several options will need to be reconfigured before we apply this change.

Select Standardize - shift column to mean of zero and scale to standard deviation of one from the Normalization section
Select the X-axis tab
Set Text Angle to 90

There should now be a sub-plot for each category, in this case there are four sub-plots, one for each tissue (Figure 13). There are no error bars for several plots because there are not enough samples in those categories.

Figure 13. Splitting a plot by a categorical factor, Tissue, and grouping by another categorical variable, Type

These sub-plots can be displayed all together, or individually.

Select 1 from the Items/Page drop-down menu

You can now move through the sub-plots by selecting Next >.

Select All from the Items/Page drop-down menu to return to the 2x2 view

This data can also be displayed as a gradient plot (Figure 14) or error bar plot (Figure 15) by changing the Plot type using the drop-down menu in the Style tab. By default, the shading range in the gradiant plot and the error bars show +/-1 standard deviation from the mean.

Figure 14. Gradient plot

Figure 15. Error bar plot

The final option, violin plot, cannot be used to display samples grouped by a categorical variable. To view a violin plot, we must remove the Summarize by selection.

Select (One profile per sub-plot) from the Summarize by drop-down menu
Select Violin plot from the Plot type drop-down menu
Select None - do not adjust values for Normalization

The plot now displays violin plots for each gene showing the distribution of probe intensity values for each tissue in a separate sub-plot (Figure 16).

Figure 16. Violin plots for each gene, sub-plots for each tissue

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Starting with a list of genomic regions

Importing a region list

A region list must contain the chromosome, start location, and stop locations as the first three columns. The chromosome number in the region list must be compatible with the genomic annotation for the species if you plan to use any feature (like motif detection) that requires reference sequence information.

Import the region list as described above for text files with the following options
- Select Other for data type
- Set chromosome as a text field
- Set location start and stop as either integer or text fields
Right-click on the imported spreadsheet in the spreadsheet tree
Select Properties
Select List of genomic regions from the Configure Spreadsheet dialog to add region to the properties (Figure 1)

Figure 1. Adding region to the properties of a spreadsheet

The spreadsheet properties will now include region. Alternatively, region can be added as a spreadsheet property from the Configure Genomic Properties dialog by selecting Advanced.. , choosing region from the drop-down menu, selecting Add, and selecting OK.

If you would like to do any operation that requires looking up the reference genomic sequence information for the regions based on genomic location, you will need to specify the species for this region list.

Right-click on the imported spreadsheet in the spreadsheet tree
Select Properties
Select species from the Add Property drop-down menu and click Add

Motif detection

Starting with a region list, you may detect either known or de novo motifs using the ChIP-Seq workflow if your spreadsheet has been associated with a species and a reference genome.

Select ChIP-Seq from the Workflows drop-down menu
Select Motif detection from the Peak Analysis section of the workflow

Both Discover de novo motifs and Search for known motifs can be performed. Motif detection sequence information of the genome, you can specify either .2bit file or .fa file which can be used to create .2bit file

Determining the average values for a region list

If you have a region list or a .BED file and you have a microarray experiment with data, you can summarize the microarray data by the genomic coordinates contained in the region list. For example, the region list contains a list of CpG islands, the experiment contains methylation percentage values for probes (β values), and you would like to summarize the methylation values of all probes in each CpG island.

Import the region list (or .BED file)

Be sure that you have added the region property. The list of region coordinates (chromosome, start, stop) from the region list will be mapped against the reference genome specified for the microarray data so specifying Species and Genome Build for your region list is unnecessary.

Open the microarray data spreadsheet, this spreadsheet should have annotation file associated to, and there are genomic location information in the annotation file.

Samples should be on rows and data on columns in the microarray data spreadsheet.

Select the region list spreadsheet
Right-click any column header in the region list spreadsheet
Select Insert Average from the pop-up menu (Figure 2)

Figure 2. Adding the average values for a region list

Select the microarray data spreadsheet containing the values you want to average for each region from the Get average from spreadsheet drop-down menu

There are three options for averaging the data (Figure 3). Mean of samples significant in region is used when the region list has SampleIDs from the microarray data set associated with each region. In this case, only the microarray data set samples specified for each region would be included in the mean calculation. Mean of all samples will add columns for the mean value of all probes for all samples and the number of probes for all samples in each region. Mean value for all samples separately will add two columns for each sample with the mean value of all probes for that sample and the number of probes for that sample in each region.

We have selected Mean value for all samples
Select OK (Figure 3)

Figure 3. Selecting options for adding average values for regions

Columns will be added to the regions list spreadsheet. Here, we have added two columns with the average β-value for all samples in each CpG island and the number of probes in each CpG island (Figure 4).

Figure 4. Added average beta values and number of probes per CpG island

Find region overlaps

If you have two or more region lists with coordinates on the same reference genome, you can compare them to identify overlapping regions.

Open all region list spreadsheets that you want to compare
Select Tools from the main toolbar
Select Find Region Overlaps (Figure 5)

Figure 5. Selecting Find Region Overlaps

The Find Region Overlaps tool has two modes of operation. The first, Report all regions, creates a new spreadsheet with any regions that did not intersect and all regions of intersection between any of the input lists. For each intersection, the start and stop coordinates of the intersection and the percent overlap between the intersected region with each of the regions in the input lists are reported. The second, Only report regions present in all lists creates a new spreadsheet with the intersected regions found in all the lists.

Select your preferred mode; we have selected Only report regions present in all lists
Select Add New Spreadsheet to add any spreadsheets you want to compare; we are comparing two region list spreadsheets (Figure 6)
Select OK

Figure 6. Configuring Find Overlapping Regions

A new region list spreadsheet will be created (Figure 7). The new region list is a temporary spreadsheet so be sure to save it if you want to keep it.

Figure 7. Spreadsheet with regions present in all lists

Importing a genomic position list for SNV annotation

To be annotated using the Annotate SNVs tool, an imported SNV position list must have four columns per locus:

Position of the SNP listed as chr.basePosition
Sample ID or name
The reference base
The SNP call (sample genotype base)

Prepare input list as shown (Figure 8) with four columns describing the position, sample, reference base, and sample genotype base for each SNV

Figure 8. An imported SNV list must follow this format to be annotated by the Annotate SNV tool. The first column must be the position and the position must follow the format shown, chr.basePosition

Save as either a tab-separated or comma separated file
Import the table as a text file
Select Genomic data for What type of data is this file?

The Annotate SNVs tool can now be invoked on this spreadsheet to generate an annotation spreadsheet (Figure 9).

Figure 9. Annotate SNVs creates a new spreadsheet annotating each SNV from the source list

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Hierarchical Clustering Analysis

What is Hierarchical Clustering?

Hierarchical clustering groups similar objects into clusters. To start, each row and/or column is considered a cluster. The two most similar clusters are then combined and this process is iterated until all objects are in the same cluster. Hierarchical clustering displays the resulting hierarchy of the clusters in a tree called a dendrogram. Hierarchical clustering is useful for exploratory analysis because it shows how samples group together based on similarity of features.

Hierarchical clustering is an unsupervised clustering method. Unsupervised clustering methods do not take the identity or attributes of samples into account when clustering. This means that experimental variables such as treatment, phenotype, tissue, number of expected groups, etc. do not guide or bias cluster building. Supervised clustering methods do consider experimental variables when building clusters.

Visualizing Hierarchical Clustering

To illustrate the capabilities and customization options of hierarchical clustering in Partek Genomics Suite, we will explore an example of hierarchical clustering drawn from the tutorial . The data set in this tutorial includes gene expression data from patients with or without Down syndrome. Using this data set, 23 highly differentially expressed genes between Down syndrome and normal patient tissues were identified. These 23 differentially regulated genes were then used to perform hierarchical clustering of the samples. Follow the steps outlined in to perform hierarchical clustering and launch the Hierarchical Clustering tab (Figure 1).

Figure 1. Heatmap showing results of hierarchical clustering

The right-hand section of the Hierarchical Clustering tab is a heat map showing relative expression of the genes in the list used to perform clustering. The heat map can be configured using the properties panel on the left-hand side of the tab. In this example, the low expression value is colored in green, the high expression value is in red, and the mid-point value between min and max is colored in black.The dendrograms on the left-hand side and top of the heat map show clustering of samples as rows and features (probes/genes in this example) as columns. Columns are labeled with the gene symbol if there is enough space for every gene to be annotated. Rows are colored based on the groups of the first sample categorical attribute in the source spreadsheet. The sample legend below the heat map indicates which colors correspond to which attribute group. In this example, Down syndrome patient samples are red and normal patient samples are orange.

The heat map can be configured using the properties panel on the left-hand side of the Hierarchical clustering tab.

Configuring the Hierarchical Clustering Plot

Labeling Sample Groups in the Heat Map

Select the Rows tab
Verify that Type appears in the annotation box
Set Width (in pixels) to 25

This will increase the width of the color box indicating sample Type.

Select Show Label
Set Text size to 12
Set Text angle to 90

This angle is relative to the x-axis. When set to 90, the text will run along the y-axis.

Select Apply

The sample attributes are now labeled with group titles (Figure 2).

Figure 2. Labeling heat map with sample attribute groups

Adding a Sample Attribute to the Heat Map

Select the Rows tab
Select Tissue from the New Annotation drop-down menu
Select Apply

Color blocks indicating the tissue of each sample have been added to the row labels and sample legend (Figure 3).

Figure 3. Sample attributes can be added to the heat map as sample labels

Changing the Orientation of the Rows and Columns

By default, Partek Genomics Suite displays samples on rows and features on columns. We can transpose the heat map using the Heat Map tab in the plot properties panel.

Select the Heat Map tab
Select Transpose rows and columns in the Orientation section
Select Apply

The plot has been transposed with samples on columns and features on rows. The label for the sample groups is now in the vertical orientation because the settings we applied to Rows has been applied to Columns.

Select the Columns tab
Select the Type track
Set Text angle to 0

The sample group label for Type is now visible (Figure 4).

Figure 4. Heat map columns and rows can be transposed

Flipping Columns or Rows

Each cluster node has two sub-cluster branches (legs) except for the bottom level in the dendrogram, the order of the two branches (or legs) is arbitrary, so the two sub-clusters position can be flipped within the cluster. This does not change the clustering, only the position of the clusters on the plot.

Select () from the Mouse Mode icon set to activate Flip Mode
Clicking on a line (or drawing a bounding box on a line using left mouse button) that represents a sub-cluster branch (or dendrogram leg) will flip the selected leg with the other one leg within the same parent cluster. In this example, clicking on the bottom line will move it to the top of the heat map (Figure 5).

Figure 5. Rows and columns can be flipped by using Flip Mode to select dendrogram legs

Changing Heat Map Colors

The minimum, maximum, and midpoint colors of the heart map intensity plot can be customized.

Select the Heat Map tab
Set Min color to () using the color picker tool
Set Max color to () using the color picker tool

The heat map and plot intensity legend now show maximum values in yellow and minimum values in light blue with a black midpoint (Figure 6). The data range can also be customized by changing the values of Min and Max.

Figure 6. Heat map colors for minimum, maximum, and midpoint intensity can be customized

Zooming to Selected Rows/Columns

We can use the hierarchical clustering heat map to examine groups of genes that exhibit similar expression patterns. For example, genes that are up-regulated in Down syndrome samples and down-regulated in normal samples.

Select () from the Mouse Mode icon set to activate Selection Mode
Select on the middle cluster of the rows dendrogram as shown (Figure 7) by clicking on the line or drawing a bounding box around the line

The lines within the selected cluster will be bold and the corresponding columns (or rows) on the spreadsheet in the analysis tab will be highlighted.

Figure 7. Selecting a dendrogram cluster using Selection Mode

Right-click anywhere in the viewer
Select Zoom to Fit Selected Rows

The same steps can be used to zoom into columns or rows. Here, we have zoomed in on rows, but not columns to show the expression levels of the selected genes for all samples (Figure 8).

Figure 8. Viewing only selected genes for all samples

To reset zoom select () on the y-axis to show all rows and the x-axis to show all columns.

Select () on the y-axis to show all rows
Left click anywhere in the hierarchical clustering plot to deselect the dendrogram

Exporting a List of Genes From a Selected Cluster

Partek Genomics Suite can export a list of genes from any cluster selected, allowing large gene sets to be filtered based on the results of hierarchical clustering.

Select () from the Mouse Mode icon set to activate Selection Mode
Select the bottom cluster of the rows dendrogram
Right-click to open the pop-up menu

Figure 9. Creating gene list from selected cluster

Name the gene set down in normal
Select OK
Save the list as down in normal

In the Analysis tab, there is now a spreadsheet row_list (down in normal.txt) containing the 6 genes that were in the selected cluster. The same steps can be used to create a list of samples from the hierarchical clustering by selecting clusters on the sample dendrogram.

Saving Plot Properties

Once you have created a customized plot, you can save the plot properties as a template for future hierarchical clustering analyses.

Select the Save/Load tab
Select Save current...
Name the current plot properties template; we selected Transposed Blue and Yellow

The new template now appears in the Save/Load panel as an option. To load a template, select it in the Load/Save panel and select Load selected. Note that all properties, including Min and Max values and sample groups (based on the column number of the attribute in the source spreadsheet) that may not be appropriate for a different data set, will be applied.

Exporting the Hierarchical Clustering Plot Image

The hierarchical clustering plot can be exported as a publication quality image.

Select the Hierarchical Clustering tab
Select File from the main toolbar
Select Save Image As... from the drop-down menu

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

User Manual

Lists

hashtagAdditional Assistance

Importing a text file list

hashtagAdditional Assistance

Tasks available for a gene list

Starting with a list of SNPs

hashtagAnnotating SNPs with genes

Gene Ontology ANOVA

Implementation Details

hashtagAdditional Assistance

Configuring the GO ANOVA Dialog

Performing GO ANOVA

GO ANOVA Visualisations

hashtagDot Plots

hashtagProfile Plots

hashtagAdditional Assistance

Recommended Filters

hashtagAdditional Assistance

Visualizations

Dot Plot

Profile Plot

Volcano Plot

Visualizing NGS Data

hashtagAdditional Assistance

Chromosome View

hashtagAdditional Assistance

Methylation Workflows

hashtagAdditional Assistance

Trio/Duo Analysis

hashtagAdditional Assistance

Association Analysis

hashtagAdditional Assistance

LOH detection with an allele ratio spreadsheet

hashtagAdditional Assistance

Import data from Agilent feature extraction software

hashtagAdditional Assistance

Illumina GenomeStudio Plugin

hashtagAdditional Assistance

Import gene expression data

hashtagPartek Gene Expression plug-in installation

hashtagExport report from GenomeStudio

hashtagOpen project in Partek Genomics Suite

hashtagAdditional Assistance

Export CNV data to Illumina GenomeStudio using Partek report plug-in

hashtagAdditional Assistance

Import data from Illumina GenomeStudio using Partek plug-in

hashtagAdditional Assistance

Export methylation data to Illumina GenomeStudio using Partek report plug-in

hashtagAdditional Assistance

Lists

hashtagAdditional Assistance

User Manual

hashtagAdditional Assistance

Importing a text file list

hashtagAdditional Assistance

Chromosome View

hashtagAdditional Assistance

Visualizing NGS Data

hashtagAdditional Assistance

LOH detection with an allele ratio spreadsheet

hashtagAdditional Assistance

Import data from Agilent feature extraction software

hashtagAdditional Assistance

Export methylation data to Illumina GenomeStudio using Partek report plug-in

hashtagAdditional Assistance

Trio/Duo Analysis

hashtagAdditional Assistance

Import data from Illumina GenomeStudio using Partek plug-in

hashtagAdditional Assistance

Illumina GenomeStudio Plugin

hashtagAdditional Assistance

Dot Plot

Performing GO ANOVA

Export CNV data to Illumina GenomeStudio using Partek report plug-in

hashtagAdditional Assistance

Methylation Workflows

hashtagAdditional Assistance

Association Analysis

hashtagAdditional Assistance

Additional Assistance

Additional Assistance

Annotating SNPs with genes

Additional Assistance

Dot Plots

Profile Plots

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Partek Gene Expression plug-in installation

Export report from GenomeStudio

Open project in Partek Genomics Suite

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Partek Gene Expression plug-in installation

Export report from GenomeStudio

Open project in Partek Genomics Suite

Additional Assistance

Additional Assistance

Dot Plots

Profile Plots

Additional Assistance

Additional Assistance

Annotating SNPs with genes

GO Enrichment

Pathway Enrichment

Filtering

Applying Multiple Test Correction

Plotting numeric data associated with a gene list

Genome Browser

Clustering

Additional Assistance

Experimental Factors

Factors Explaining Sample Dependence

Factors Explaining "Noise"

Optional Disruption Factor(s)

Contrasts

Excluding Genes

Additional Assistance

Additional Assistance

Annotating a Partek Genomics Suite-generated SNP list with SNVs

Additional Assistance

Additional Assistance

Additional Assistance

Additional Assistance

Using a BED file as an annotation source for the genome browser

Visualizing a BED file as an annotation track in the genome browser

Additional Assistance

Annotations

ANOVA Results

Contrast Results

Additional Assistance

Additional Assistance

Partek Genotype plug-in installation

Export report from GenomeStudio