arrow-left

All pages
gitbookPowered by GitBook
1 of 40

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Lists

Scientists often develop lists of genes, probes, transcripts, SNPs, and genomic regions of interest from analysis tools, research papers, and databases. Using Partek Genomics Suite, these lists can be integrated with genomics data sets, analyzed with powerful statistics, and visualized for new insights.

This user guide will illustrate:

  • Importing a text file list

  • Adding annotations to a gene list

This user guide does not discuss every operation that can be performed on an imported list of regions, SNPs, or genes. If there is some other feature in Partek Genomics Suite that you would like to apply to an imported list, please contact the technical support team for additional guidance. If you have found a novel use of a Partek Genomics Suite feature on an imported list that you think should be included in this user guide, please let us know.

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

User Manual

Partek Genomics Suite is a comprehensive suite of advanced statistics and interactive data visualization specifically designed to reliably extract biological signals from noise. Designed for high-dimensional genomic studies containing thousands of samples, Partek Genomics Suite is fast, memory efficient and will analyze large data sets on a personal computer. It supports a complete workflow including convenient data access tools, identification and annotation of important biomarkers, and construction and validation of predictive diagnostic classification systems.

Tasks available for a gene list
Starting with a list of genomic regions
Starting with a list of SNPs
Importing a BED file
Additional options for lists
our support pagearrow-up-right

  • Additional information can be found in the manual for Partek Genomics Suite version 6.6.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Lists
    Annotation
    Hierarchical Clustering Analysis
    Gene Ontology ANOVA
    Visualizations
    Visualizing NGS Data
    Chromosome View
    Methylation Workflows
    Trio/Duo Analysis
    Association Analysis
    LOH detection with an allele ratio spreadsheet
    Import data from Agilent feature extraction software
    Illumina GenomeStudio Plugin
    file-pdf
    15MB
    Partek Genomics Suite & Manual.pdf
    PDF
    arrow-up-right-from-squareOpen
    our support pagearrow-up-right

    Importing a text file list

    The preferred method for importing a generic data spreadsheet into Partek Genomics Suite is as a text file. Here, we illustrate importing a list of genes with p-value and fold-change from an experiment comparing two conditions.

    • Select File from the main toolbar

    • Select Import

    • Select Text (.csv .txt)...

    • Select the text file using the file browser to launch the Import .txt, .tsv, or .csv File dialog

    The File Type section of the Import dialog includes a preview of the text file and import options (Figure 1).

    The columns in the import file can be separate by a tab, comma, or any other character.

    For most applications, the items on the list should be in rows while attributes or values should be in columns. If a list is oriented with items on columns, select Transpose the file to to import a transposed spreadsheet.

    • Select Next > to move to the Data Type section

    • Select your data type; here we have chosen Genomic Data because it is a gene list (Figure 2)

    We have also deselected Is the data log transformed (LOG_base (x+offset) ) ?

    Selecting Genomic Data will open a dialog after import to configure properties for the imported list including selecting the type of genomic data, the location of genomic features in the spreadsheet, the annotation column with gene symbols, the chip or reference source and annotation file, the species, and reference genome build.

    • Select Next >

    The next step is to identify where the data starts and where the optional header is found using Identify Column Labels, Start of Data (Figure 3). The line that contains the header (if present) must precede the data. If there are lines to be skipped in the file (like comments), they may only appear at the top of the file, before the header line or data begin.

    If there are many comment lines at the start of the file, you may need to select View Next 5 Records to get to the row that contains the column header. If you accidentally move past the screen that contains the header or data rows, select View Previous 5 Records.

    If there are missing numerical values or empty cells in your input list, insert a special character or symbol (?, N/A, NA, etc.) in the missing cells; you will specify the character in the Missing Data Representation section of the dialog, only one symbol can be used to represent missing values, the default missing value indicator is ?.

    • If a header row is present, select Col Lbls to allow you to select a column header row

    • Select the row where the data beings using the Begin Data selector

    • If any cells have a missing value, you can signify this with a special symbol selected using the Missing Data Representation panel

    The Preview text encoding section (Figure 4) previews the first five lines of the file, allowing you to check if the text encoding is correct.

    • If the text does not appear properly, use the Specify the text encoding: drop-down menu to choose the correct encoding

    • Select Next >

    The final section of the Import .txt, .tsv, or .csv File dialog is Verify Type & Attribute of Data Columns (Figure 5). While data column type and attribute can be modified after import, it is easier and faster to select the proper options during import as multiple columns may be selected during this dialog.

    • Check and modify column types and attributes

    If there is an identifier like gene symbol or SNP, the Type field for that column should be set to text and Attribute should be set to label. Numeric values (intensities, p-values, fold-changes, etc.) should have Type set to double and Attribute set to response. The other possible value for Attribute is factor and describes sample data. The user interface is this dialog allows you to select multiple columns at once (Ctrl+left click and Shift+left click). The interface controls are detailed in the dialog (Figure 5).

    • Select Finish to import the text file and open it as a spreadsheet

    If Genomic data was selected in the Data Type section, the Configure Genomic Properties dialog will open (Figure 6). These options will be discussed in the next section when we add an annotation file.

    • Select OK

    The imported spreadsheet will open (Figure 7).

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Select Next >

    our support pagearrow-up-right
    Figure 1. Import .txt, .tsv, or .csv file dialog
    Figure 2. Selecting the data type
    Figure 3. Identifying column labels and start of data
    Figure 4. Previewing text encoding
    Figure 5. Verifying type and attribute of data columns. While individual column types and attributes can be modified after import, this dialog allows multiple columns to be selected and modified simultaneously.
    Figure 6. Many types of genomic data can be imported into Partek Genomics Suite using the text data file importer. This dialog allows these files to be associated with an annotation file and reference genome.
    Figure 7. An imported .txt data file spreadsheet

    LOH detection with an allele ratio spreadsheet

    This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

    file-pdf
    110KB
    AlleleRatioLOHDocumentation.pdf
    PDF
    arrow-up-right-from-squareOpen

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Chromosome View

    This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

    file-pdf
    2MB
    Chromosome Viewer.pdf
    PDF
    arrow-up-right-from-squareOpen

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Additional options for lists

    hashtag
    GO ANOVA, GSEA/GeneSet ANOVA, and Pathway ANOVA

    As these features require intensity (or count) data as well as experimental groups, these features cannot be performed on an imported lists.

    hashtag
    Integrating imported data

    If the data from imported spreadsheets has been associated with annotations, several integration approaches may be used to integrate multiple kinds of imported data.

    The Genome Browser may be used to display data from multiple spreadsheets/experiments regardless of the type of spreadsheets (imported data or microarray or NGS experiments).

    The Venn Diagram tool may be used to find overlaps based on a feature name.

    The Find Overlapping Regions tool can use an imported gene list and a list of regions from a copy number or ChIP-Seq experiment to identify genomic regions in common.

    hashtag
    Additional use cases

    This User Guide did not discuss every operation that can be performed on an imported list of regions, SNPs, or genes. If there is some other feature that you would like to apply to an imported list, please contact the technical support team for additional guidance. If you have found a novel use of a feature on an imported list that you think should be included in this User Guide, please let us know.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    GO ANOVA Visualisations

    There are two main visualizations for use with GO ANOVA outputs:

    • Dot plots used to visualize differential expression of functional groups

    • Profile plots used for visualizing disruption of gene expression patterns within the group

    hashtag
    Dot Plots

    Dot plots represent each sample with a single dot. The position of each dot is calculated as the average expression of all genes included in the functional group. Invoke this plot by right clicking on the row header of a functional group of interest and choosing Dot Plot (Orig. Data). The color, shape, and size of the dots can be set to represent sample information in the plot properties dialogue, invoked by pressing on the red ball in the upper left.

    Figure 1 shows a dot plot for a GO category "cell growth involved in cardiac muscle cell development", which is expressed in the heart at a level of almost four times that of the brain, evidenced by the difference of just under two units on the y-axis (in the current example the values on the y-axis are shown in log2 space). Note that the replicates are grouped neatly, making this category highly significant. That is not a surprise, given that the genes belonging to that category are likely very specific for the heart.

    Figure 1. Dot plot of a significantly differentially expressed GO category. Each dot is a sample, box-and-whiskers summarize groups

    hashtag
    Profile Plots

    Profile plots or profiles represent each category of one of the GO ANOVA factors as a few overlapping lines. Horizontal coordinates refer to individual genes or probes in the original data. Vertical coordinates represents expression of the individual gene. Invoke this plot by right clicking on the row header of a function group of interest and choosing Profile (Orig. Data). This plot is useful as the pattern of gene expression in the group is displayed as a line. If the pattern is conserved across treatments, the lines will lie parallel, but if the gene reacts differently, the lines will follow a different pattern, maybe even cross each other.

    Profile plot on Figure 2 visualizes a GO category without differential expression, but with significant disruption. Note that the gene TNNI3 is up-regulated in the heart, while STX1A is down-regulated in the heart.

    Figure 2. Profile plot of a GO category with significant disruption but not differential expression. Each data point is a gene (error bars are standard error of the mean)

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Performing GO ANOVA

    Preparing a data set for analysis requires importing the data, normalizing the data as appropriate for standard gene expression analysis, and inserting columns containing the experimental variables. Checkout for more details about preparing data. It is not necessary to perform a differential analysis of gene expression before GO ANOVA.

    For the sake of example, the following walkthrough will consider an experiment that has been imported which includes two different tissues, brain tissue and heart tissue, extracted from a small set of patients.

    The GO ANOVA function is available in the Gene Expression, microRNA Expression, RNA-Seq, and miRNA-Seq workflows.

    • Select the Gene Expression

    Visualizations

    This user guide illustrates:

    GO ANOVA Output

    GO ANOVA output is very similar to standard ANOVA output except each row in the resulting sheet contains statistical results from a single GO functional group rather than a single gene. Columns can be broken down into four sections:

    • Annotations contain detail about the category being considered

    • ANOVA results contain the significance of the effect of the factors in the model

    Import data from Agilent feature extraction software

    This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Recommended Filters

    When looking for simple differential expression, sorting by ascending on the factor p-values is ideal. This will find groups that are the most significantly apart across all the contained genes. In the interest of finding groups that are less likely to be called by chance, it may be wise to filter to groups with a minimum of 4 or 5 genes (Figure 1). Simple filters can be done using the interactive filter () available from the button on the toolbar at the top of the screen.

    If there is more than one factor in the model, more complex criteria combining the factors can be specified using Tools>List Manager menu Advanced tab. For example, to find categories that are significant and changed by at least two fold, make two criteria: one for a low p-value and the other for a minimum of two fold change, and take the intersection of the two criteria.

    Figure 1. Top ten functional groups sorted by the Tissue p-value after filtering to a minimum five gene in the GO category. Note that most of the groups can be directly related to the heart muscle

    If the disruption (factor*gene interaction) is tested, the filters can become more complicated. The most pressing need for complex filters is that when analyzing larger functional groups it is not expected that the entire functional group will behave the same. Looking back at Figure 1, notice how the low values in column 7 are present because not every gene is equally differentially expressed even in the most differentially expressed of groups. That is, when there is significant differential expression, it is likely that there will also be disruption as at least a single gene is likely participating in a role beyond that of the functional group and will not follow the pattern of the rest of the group. This situation is expected and leads to a new type of filter.

    Profile Plot

    The profile plot displays probe(set)/gene intensity values across samples and genes.

    We will invoke a profile plot from a gene list child spreadsheet with genes on rows.

    • Select the rows to be visualized

    • Right-click on a row header of one of the selected rows

    Association Analysis

    This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Visualizing NGS Data

    This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Import data from Illumina GenomeStudio using Partek plug-in

    This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    our support pagearrow-up-right
    file-pdf
    193KB
    ImportingAgilentDataintoPartek.pdf
    PDF
    arrow-up-right-from-squareOpen
    our support pagearrow-up-right
    file-pdf
    1MB
    Using the Association Workflow.pdf
    PDF
    arrow-up-right-from-squareOpen
    our support pagearrow-up-right
    file-pdf
    823KB
    Visualizations of Next Generation Sequencing Data.pdf
    PDF
    arrow-up-right-from-squareOpen
    our support pagearrow-up-right
    file-pdf
    187KB
    GenomeStudioGenotypePlugin.pdf
    PDF
    arrow-up-right-from-squareOpen
    our support pagearrow-up-right
    our support pagearrow-up-right

    Methylation Workflows

    This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

    file-pdf
    582KB
    Methylation User Guide.pdf
    PDF
    arrow-up-right-from-squareOpen

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Export CNV data to Illumina GenomeStudio using Partek report plug-in

    This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

    file-pdf
    299KB
    GenomeStudioGeneExpressionPlugin.pdf
    PDF
    arrow-up-right-from-squareOpen

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    workflow (or any of the other ones) from the Workflows drop-down on the upper right of the spread sheet
  • Go to the Biological Interpretation section of the workflow

  • Select Gene Set Analysis (Figure 1) and then Gene Set ANOVA

  • Figure 1. GO ANOVA dialog can be invoked via Gene Set Analysis option of the workflow

    For this example analysis, the model was kept easy to interpret by including Subject and Tissue as the only ANOVA factors. Additionally, Tissue was added to the Disruption Factor(s). Including Subject controled for person to person variation, and including Tissue allowed the analysis of differential expression and of functional category disruption between tissue types. For the sake of simplicity and minimizing run time, the term Subject was not added to the Disruption Factor(s) box. Including it would have helped correct for subject specific gene expression patterns, though the results were largely unaffected in this case.

    Performing GO ANOVA analysis on very large GO categories can take quite a bit of time. More importantly, very large categories may have too large a scope to be useful. To speed the operation and analyze only smaller GO categories, specify 20 genes as the maximum size for an analyzed GO category.

    For the sample dataset, the GO ANOVA dialog setup should appear as in Figure 2 below.

    Figure 2. GO ANOVA configured for the user guide data set. Two factors added to the model

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    our tutorial pagearrow-up-right

    Volcano Plot

  • Scatter Plot and MA Plot

  • Sort Rows by Prototype

  • Manhattan Plot

  • Violin Plot

  • This user guide assumes the user is familiar with the hierarchy of spreadsheets and analysis in Partek Genomics Suite.

    Many plots available in Partek Genomics Suite are not discussed in this user guide. A more thorough review of Partek Genomics Suite visualizations can be found in Chapter 6: The Pattern Visualization System of the Partek User's Manual available from Help > User’s Manual in the Partek Genomics Suite main toolbar.

    There is no specific data set for this tutorial. You may use one of your own microarray experiments or use a data set from one of our tutorials.

    Visualizations are generated using data from a spreadsheet. Some visualizations allow interactive filtering on the plot, but others do not. If you only wish to include certain rows or columns in a visualization, you may need to create a spreadsheet with only the rows or columns of interest by applying a filter and cloning the spreadsheet.

    In general, probe(set)/gene intensity values may be visualized from either an ANOVA spreadsheet or a filtered ANOVA spreadsheet. Because intensity data is stored in the parent spreadsheet, the parent and child spreadsheets should be visible in the spreadsheet navigator with the appropriate parent/child relationship (Figure 1).

    Figure 1. Down_Syndrome-GE is the parent spreadsheet; ANOVAResults and A are child spreadsheets of Down_Syndrome-GE

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Dot Plot
    Profile Plot
    XY Plot / Bar Chart
    Contrast results contain significance and fold change of the difference between groups compared via contrast
  • F-ratios display the significance of the factors in the ANOVA model

  • hashtag
    Annotations

    Annotations will take up the first four columns of the results sheet (Figure 1). The first column (# of genes) is the number of genes in the GO category. Specifically, this is not necessarily the number of unique genes in the category; depending on the technology, it can be the number of probes or probe sets on the microarray whose targets fall into the GO category. Genes targeted more than once will be counted more than once. The second column (GO ID) is the unique numeric identifier of the GO category; it is sometimes useful for searching with when the GO category has a very long name. The third column is the type of the GO category, while the fourth column (GO Description) is the name of the GO category.

    Figure 1. GO ANOVA annotation columns (example)

    When right click on any row header to choose Create Gene List , a new spreadsheet will be generated, it contains a list of genes (probes/probesets) within the selected GO category.

    hashtag
    ANOVA Results

    ANOVA results will include a column for each factor in the setup (Figure 2). A column with the name of the factor or interaction followed by p-value will contain how significant the effect of the variable is on the data. A lower p-value corresponds with a more significant effect. For example, a p-value of 0.1 for tissue means that given the difference between the tissue and the inherent variability of the measurements of the genes in the functional group, there is a 10% likelihood that the tissues are equivalent. A p-value of 0 occurs when the value is too small to be displayed. This can be caused by a very low estimate of inherent variability due to either a very small number of replicates or severely unbalanced data.

    Figure 2. Viewing the GO ANOVA result

    In the example experiment, a low p-value for tissue would imply the functional group is differentially expressed across tissues.

    A low p-value for an interaction implies that the effect of one factor on the other is significant. In the example dataset, no interactions between two main variables were included as factors. To illustrate what the interaction p-value would mean, consider the case that a drug compound and a control injection were dosed over several time points and an interaction between injection compound and time point was included in the GO ANOVA. A low p-value for the drug-time point interaction corresponds to the effect of drug on the functional group being altered with time.

    A column will also be present for each factor placed in the Disruption Factor(s) box. This column will have the header Disruption(Factor name). A low p-value in this column corresponds to the different states presenting with different gene patterns within the functional group. For functional groups containing only a single gene, no value will be present as the pattern cannot change. In the example experiment, a low p-value for the Disruption(Tissue) represents function categories which have different genes operating in the heart and in the brain.

    hashtag
    Contrast Results

    Contrast results include four columns for each of the comparisons declared during GO ANOVA setup. The first column contains the p-value representing the significance of the difference between the two categories. The second column contains the ratio between the two groups where increases are represented as greater than one and decreases are represented as values between zero and one. The third column is the fold change of the functional group between the two categories where increases are greater than one and decreases are less than negative one. The fourth column contains a plain text description of the direction of the fold change. Fold changes and ratios represent the average change in the functional category. In the example, a contrast was run comparing expression in the cerebral tissue to the heart tissue (Figure 3). As these were the only tissues, the p-values are identical to those in column 5. While the p-value column shows which groups are differentially expressed between the tissues, the fold change columns allow us to see by how much they are differentially expressed. Using the sign of the fold change, or the description column, you can see which categories are increased in brain and which are increased in heart.

    Figure 3. Viewing the GO ANOVA contrast columns

    hashtag

    F-Ratios

    F-ratios (Figure 4) are used in the computation of p-values. The values in the columns can safely be ignored by most users; there are exceptional cases when the F-ratios may be informative. To see the general significance of the factors included in the model, a Sources of Variation plot can be computed from these values from the View menu (or the Workflow). The higher the average F-ratio, the more important the factor is to the model on average.

    Figure 4. Viewing the GO ANOVA F-ratios

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Filtering for low p-values on the factor and then filtering for low p-values on the factor interacted with gene will find groups that are differentially expressed, but contain at least a few genes that are either disrupted due to treatment, or simply are involved in additional functional groups beyond the scope of the one being analyzed. This list often contains some of the more informative big picture functional groups.

    Figure 2. Top ten functional categories sorted by Disruption(Tissue) p-value after filtering to a minimum of five genes in the GO category. By prioritizing by the disruption column this type of a list is more "big picture"

    If looking for disruption for groups which are not so much differentially expressed, but instead which express different genes for different treatments, filter for low disruption p-values but for high factor p-values. As shown by Figure 2, large or diverse groups that are differentially expressed will often exhibit significant disruption. In fact, a group that is differentially expressed but includes even a single gene that is not changed will have very significant disruption. These situations are certainly notable, but are distracting if looking for functional groups that instead are uniquely patterned based on treatment. By filtering out those groups with low p-values for the factor and then looking at the remaining groups with low p-values for disruption, groups observed have usually very distinct patterns of expression (Figure 3).

    Figure 3. Top ten functional categories sorted by Disruption(Tissue) p-value after filtering to a minimum of five genes in the GO category and minimum Tissue p-value of 0.3. This list is especially interesting, as using enrichment alone to detect such categories would require a lot of labour.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Select Profile Plot (Orig. Data) from the pop-up menu (Figure 1)

    Figure 1. Selecting Profile Plot for selected rows

    The profile plot will be displayed in a new tab (Figure 2). Lines are probe(sets)/genes and columns are samples from the parent spreadsheet.

    Figure 2. Basic profile plot. Each line represents a different prob(set)/gene; each column represents a sample from the parent spreadsheet

    A basic profile plot will likely need customization. The plot configuration, properties, and control options are the same as shown for a dot plotarrow-up-right. We will illustrate a few modifications here.

    We can change the row labels to show each sample ID.

    • Select ()

    • Select the Axes tab

    • Set Grid to 1

    • Select Rotate X-Axis Labels and set to 90 degrees (rotates counter-clockwise)

    • Set Label Format to Column and select 5. Subject

    We can add symbols to show which group each sample belongs to.

    • From the Shape by drop-down menu, select 3.Type

    • Select OK

    Symbols have now been added to each profile line plot (Figure 3).

    Figure 3. The profile plot can be modified to facilitate analysis or presentation

    Note that samples present on the parent spreadsheet cannot be excluded from the profile plot. To plot only a subset of the samples you must filter the parent spreadsheet.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Implementation Details

    The method used to detect changes in functional groups is ANOVA. For detailed information about ANOVA, see Chapter 11 of the Partek User Manual. There is one result per functional group based on the expression of all the genes contained in the group. Besides all the factors specified in the ANOVA model, the following extra terms will be added to the model by Partek Genomics Suite automatically:

    • Gene ID - Since not all genes in a functional group express at the same level, gene ID is added to the model to account for gene-to-gene differences

    • Factor * Gene ID (optional) - Interaction of gene ID with the factor can be added to detect changes within the expression of a GO category with respect to different levels of the factor, referred to in this document as the disruption of the categories expression pattern or simply disruption

    Suppose there is an experiment to find genes differentially expressed in two tissues: Two different tissues are taken from each patient and a paired sample t-test, or 2-way ANOVA can be used to analyze the data. The GO ANOVA dialog allows you to specify the ANOVA model, which includes the two factors: tissue and participant ID. The analysis is performed at the gene level, but the result is displayed at the level of the functional group by averaging of the member genes’ results. The equation of the model that can be specified is:

    y = µ + T + P + ε

    • y: expression of a functional group

    • µ: average expression of the functional group

    • T: tissue-to-tissue effect

    When the tissue is interacted with the gene ID then the ANOVA model becomes more complicated as demonstrated in the model below. The functional group result is not explicitly derived by averaging the member genes as the new model includes terms for both gene and group level results:

    y = µ + T + P + G + T *G + ε

    • y: expression of a functional group

    • µ: average expression of the functional group

    • T: tissue-to-tissue effect

    In the case that there is more than one data column mapping to the same gene symbol, Partek Genomics Suite will assume that the markers target different isoforms and will not treat the two markers as replicated of the same gene. Instead, each column is treated as a gene unto itself.

    If there are only two samples in the spreadsheet then, Partek Genomics Suite cannot calculate a type by gene ID interaction. In this case, the result spreadsheet will contain a column labeled Disruption score. First, for each gene in the functional group Partek Genomics Suite will calculate the difference between the two samples. A z-test is used to compare the difference between each gene and the rest of the genes in the functional group. The disruption score is the minimum p-value from the z-tests comparing each gene to the rest in the functional group. A low disruption score therefore indicates that at least one gene behaves differently from the rest. This implies a change in the pattern of gene expression within the functional group and potential disruption of the normal operation of the group. The category as a whole may or may not exhibit differential expression in addition to the disruption.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Illumina GenomeStudio Plugin

    The GenomeStudio plug-in lets you export data into a project that can be opened in Partek Genome Suite open directly. It is the fastest and most consistent way to get fully annotated Illumina data into Partek Genomics Suite.

    • Import gene expression data

    • Import Genotype Data

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Gene Ontology ANOVA

    With gene ontology (GO) ANOVA, Partek Genomics Suite includes the ability to use rigorous statistical analysis to find differentially expressed functional groupings of genes. Leveraging the Gene Ontology database, Partek Genomics Suite can organize genes into functional groups. Not only can GO ANOVA detect up and down regulated functional groups, but also functional groups, which are disrupted in a few genes as a result of treatment. Moreover, the common diction of the GO effort enables this analysis to be compared across all types of gene expression data, including those from other species. Traditional tests, such as GO enrichment, require defining filtered lists of differentially expressed genes followed by an analysis of functional groups related to those genes. On the other hand, GO ANOVA is performed directly after data import and normalization. This minimizes the risk that a highly stringent filter will cause important functional groups to be overlooked.

    Other tests, such as gene set enrichment analysis (GSEA), tolerate minimal or no pre-filtering. However, these tests are very limited in their ability to integrate complicated experimental designs. GSEA, for example, can only handle two groups at a time. GO ANOVA, on the other hand, can leverage the wealth of sample information collected and use powerful multi-factor ANOVA statistics to analyze very complex interactions and regulatory events. The analysis output includes detailed statistical results specifying the effect and importance of phenotypic information on differential expression and subsequent disruption of Gene Ontology functional categories. Furthermore, GSEA calculates enrichment scores using a running-sum statistic on a ranked gene list. GO ANOVA takes into account more information by utilizing each sample’s expression values to calculate the enrichment score.

    Note that the same principles apply to Pathway ANOVA, the only difference being the mapping file; GO ANOVA organizes genes into GO categories, while Pathway ANOVA looks at pathways.

    This user guide deals with the following topics:

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Manhattan Plot

    The Manhattan plot is a common way to visualize p-values or log-odds ratios for GWAS studies across genomic coordinates.

    The starting point for a Manhattan plot is a spreadsheet with SNPs on rows and p-values or log-odds ratios in a column. If beginning with p-values, you will need to convert the p-values to -log10(p-value).

    • Select the column with p-values

    • Select Transform form the main toolbar

    Adding annotations to a gene list

    There are many useful visualizations, annotations, and biological interpretation tools that can operate on a gene list. In order for these features work with an imported list, an annotation file must be associated with the gene list. Additionally, many operations that work with a list of significant genes (like GO- or Pathway-Enrichment) require comparison against a background of “non-significant” genes. The quickest way to accomplish both is to use the background of “all genes” for that organism provided by an annotation source like RefSeq, Ensembl, etc. in .pannot (Partek annotation), .gff, .gtf, .bed, tab- or comma-delimited format. If the file is not already in a tab-separated or comma delimited format, you may import, modify, and save the file in the proper file format.

    Starting with a list of SNPs

    A list of SNPs using dbSNP IDs can be imported as a text file and associated with an annotation file as described for a list of genes. The annotation file you use to annotate the SNPs should minimally contain the chromosome number and physical position of each locus.

    Novel SNPs or SNPs that are not found in your annotation source must be imported as a region list. For this, follow the procedure outlined in , but use the SNP name in place of a region name.

    hashtag
    Annotating SNPs with genes

    Starting with a list of SNPs that have been associated with genomic loci using an annotation file and assigned a species with genome build, you can use Find Overlapping Genes

    Import Genotype Data

    This user guide describes how to export copy number and genotype data using Partek's Report Plug-in for Illumina GenomeStudio Genotype Module for use in Partek Genome Suite. The GenomeStudio plug-in lets you export data into a project that can be opened in Partek Genome Suite open directly. It is the fastest and most consistent way to get fully annotated Illumina gene expression data into Partek.

    hashtag
    Partek Genotype plug-in installation

    Download the plug-in zip file

    unzip the file, there is a folder called PartekReport which contains two .dll files --Partek.Common.dll

    Dot Plot

    The primary use of the dot plot is visualizing intensity values across samples.

    We will invoke a dot plot from a gene list child spreadsheet with genes on rows.

    • Right-click on the row header of the gene you want to visualize

    • Select Dot Plot from the pop-up menu (Figure 1)

    Import gene expression data

    This user guide describes how to export gene expression data using Partek's Report Plug-in for Illumina GenomeStudio Gene Expression Module for use in Partek Genome Suite. The GenomeStudio plug-in lets you export data into a project that can be directly opened in Partek Genomics Suite. It is the fastest and most consistent way to get fully annotated Illumina gene expression data into Partek Genomics Suite.

    hashtag
    Partek Gene Expression plug-in installation

    Download the plug-in zip file

    unzip the file, there is a folder called PartekReport which contains two .dll files --Partek.Common.dll

    P: participant-to-participant effect (a random effect)
  • ε: error term

  • P: patient-to-patient effect (this can be specified as a random effect)
  • G: gene-to-gene effect (differential expression of genes within the function group independent of tissue type)

  • T*G: Tissue-Gene interaction (differential patterning of gene expression in different tissue types)

  • ε: error term

  • our support pagearrow-up-right
    Export CNV data to Illumina GenomeStudio using Partek report plug-in
    Import data from Illumina GenomeStudio using Partek plug-in
    Export methylation data to Illumina GenomeStudio using Partek report plug-in
    our support pagearrow-up-right
    GO ANOVA Output
  • GO ANOVA Visualisations

  • Recommended Filters

  • Implementation Details
    Configuring the GO ANOVA Dialog
    Performing GO ANOVA
    our support pagearrow-up-right

    Select Normalization & Scaling

  • Select On Columns...

  • In the Normalization tab, set Base of the Log(x + offset) to 10

  • Select OK

  • Go to Transform > Normalization & Scaling > On Columns... again

  • Select the Add/Mul/Sub/Div tab

  • Set Multiply by Constant to -1

  • Select OK

  • The column now contains -log10(p-value).

    We can now invoke the initial plot.

    • Select View from the main toolbar

    • Select Genome View

    The Genome View tab will open. This plot will need to be configured.

    • Select () from the plot command bar

    • Select the Profiles tab

    • Remove any unwanted profiles

    • Select Add profile

    • Select Column

    • Select the column with the -log10(p-value) or logs-odds ratio values from the drop-down menu

    • Select Value for Color by

    • Select point from the Style drop-down menu

    • Select OK to add the profile

    • Select OK to close the Configure Plot Properties dialog

    The plot will now show a Manhattan plot (Figure 1).

    Figure 1. Customized Genome View showing genomic locations on the x-axis and -log10(P-values) of SNPs on the y-axis (Manhattten plot). Each dot represents a single SNP. The Cytoband is shown along the bottom of the plot

    It is also possible to display multiple chromosomes at the same time.

    • Select Show All in the upper-right hand corner of the plot

    This displays all chromosomes vertically. We can display them horizontally for a better view.

    • Select to open the Configure Plot dialog

    • Select Genome in line for Layout

    • Select OK

    To further improve the genome-wide view, we can remove the cytoband, remove the genomic position label, color points by chromosome, and increase point size.

    • Select Cytoband in the upper right-hand corner

    • Select

    • Select the Axes tab

    • Deselect Show Base Pair Labels

    • Select Profiles

    • Select Configure

    • Set Color By to a column with chromosome for each SNP/loci as a category

    • Set Shape Size to 5.0

    • Select OK to close the Configure Profile dialog

    • Select OK to apply changes

    The plot will appear as shown (Figure 2).

    Figure 2. Full genome Manhattan plot

    For details on Genome View see Chapter 6: The Pattern Visualization System in the Partek User's Manual.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    hashtag
    Associating a spreadsheet with an annotation file
    • Select File from the main toolbar

    • Select Genomic Database under Import (Figure 1)

    Figure 1. Importing an annotation file
    • Select the annotation file; in this example, we select a .pannot file downloaded from Partek distributed library file repository – hg19_refseq_14_01_03_v2.pannot

    • Delete or rearrange the columns as necessary; we have placed the column with identifiers (should be unique ID) that correspond to our gene list first

    • Select File then Save As Text File... to save the annotation file; we have named it Annotation File (Figure 2)

    Figure 2. Modified annotation file
    • Select () to close the annotation file

    Now we can add the annotation file to our imported gene list.

    • Right click 1 (gene_list.txt) in the spreadsheet tree

    • Select Properties from the pop-up menu

    This brings up the Configure Genomic Properties dialog (Figure 3).

    Figure 3. Selecting an annotation file using the Configure Genomic Properties dialog
    • Select Browse under Annotation File

    • Choose the annotation file; we have chosen Annotation File.txt

    If this is the first time you have used an annotation, the Configure Annotation dialog will launch. This is used to choose the columns with the chromosome number and position information for each feature. Our example annotation file has chromosome, start, and stop in separate columns.

    • Select the proper column configuration options (Figure 4)

    Figure 4. Assigning columns for chromosome and genomic positions in the annotation file
    • Select Close to return to the Configure Genomic Properties dialog

    • Select Set Column: to open the Choose column with gene symbols or microRNA names dialog (Figure 5)

    Figure 5. Choosing the column in the annotation file with gene symbols or microRNA names
    • Select the appropriate column; here the default choice of 1. Symbol is appropriate

    • Select OK to return to the Configure Genomic Properties dialog

    • Select the appropriate species and genome build options; we have selected Homo sapiens and hg19 (Figure 6)

    Figure 6. The gene list is now fully configured with an annotation file and reference genome selected
    • Select OK

    • Select () to save the spreadsheet

    The annotation file has been associated with the spreadsheet and additional tasks can now be performed on the data, e.g. since the annotation has genomic location, you can draw chromosome view on this data.

    hashtag
    Adding annotations to a spreadsheet

    hashtag
    Inserting annotations from an annotation file

    If an annotation file has been associated with a spreadsheet, annotations from the file can be added as columns in the spreadsheet when each identifier is on a row.

    • Right click on a column header

    • Select Insert Annotation

    • Select columns to add from Column Configuration; we have selected Chromosome, Start, and Stop (Figure 7)

    • Select OK

    Figure 7. Adding an annotation column from the annotation file

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Associating a spreadsheet with an annotation file
    Adding annotations to a spreadsheet
    to annotate these SNPs with the closest genes.
    • Select Tools from the main toolbar

    • Select Find Overlapping Genes (Figure 1)

    Figure 1. Adding overlapping genes to a SNP list

    • Select Add a New Column with the Gene Nearest to the Region from the method dialog

    The Report Regions from the specified database dialog will open.

    • Select your preferred database. Be sure to match the species and genome build of your SNP list

    • Select OK

    This will add 3 columns to the list of SNPs spreadsheet including Nearest Feature, which will indicate the nearest gene and strand (Figure 2).

    Figure 2. Find Overlapping Genes adds three columns to a SNP list: overlapping features, nearest feature, and distance to nearest feature (bps)

    To allow gene list operations such as GO Enrichment or Pathway Enrichment to be performed on the SNP list, we can set the Nearest Feature column as the gene symbol column for the spreadsheet.

    • Right click the spreadsheet in the spreadsheet tree

    • Select Properties from the pop-up menu

    • Select Gene symbol instead of Marker ID

    • Select Feature in column and select Nearest Feature (Figure 3)

    • Select OK

    alt text

    Figure 3. Setting Nearest Feature as the gene symbol allows gene list functions to be performed on a SNP list

    hashtag
    Annotating a Partek Genomics Suite-generated SNP list with SNVs

    If you have a SNP spreadsheet that was generated using Partek Genomics Suite (not imported as a .txt file), you can annotate the SNP list with gene, transcript, exon, and information about the predicted effect of the SNPs.

    • Select Tools from the main command toolbar

    • Select Annotate SNVs

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Starting with a list of genomic regions
    and
    Partek.GeneExpression.GenomeStudio.dll
    , move the
    PartekReport
    folder to

    C:\Program Files \Illumina\GenomeStudio 2.0\Modules\BSGT\ReportPlugins, if there is no ReportPlugins folder in BSGT folder, create one, the path and folder names have to be exactly match one described above (Figure 1).

    hashtag
    Export report from GenomeStudio

    In GenomeStudio genotype project:

    • Choose Analysis > Reports>Report Wizard from the main menu

    • Select Custom Report and choose Partek Report Plug-in from the drop-down list

    • Specify AnnotationName, do NOT include <> in the name, you can the same name as the .bgx file you imported the ddata with, or a unique name to your dataset

    Figure 1. Configuring the GenomeStudio copy number report dialog

    • Leave all the others as default value (Figure 2) click Next

    • Specify the report file name, we recoommend to put the exported files in their own folder, which allows you to move thefolder instead of all the files individually.

    • Click Finish (Figure 2)

    Figure 2. Specify output folder and file name

    The output generate 9 files in the folder including a project file (.ppj), annotation file, summary file and 3 sets of Partek spreadshet file-- each spreadsheet consists of 2 files.

    hashtag
    Open project in Partek Genomics Suite

    To open the report, launch Partek Genomics Suite, choose File > Open Project, browse to the .ppj file to open. There will be three spreadsheets opened (Figure 3)

    Figure 3. Open project in Partek Genomics Suite

    Spreadsheet 1 contains genotype calls, spreadsheet 2 contains log R ratio which is copy number in log scale, spreadsheet 3 contains B allele frequency.

    To do copy number analysis, select spreadsheet 2 log R ratio, choose Copy number workflow, start from QA/QC section. Genotype spreadsheet will be used for Association and LOH workflow.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    file-archive
    55KB
    PartekReportGX.zip
    archive
    arrow-up-right-from-squareOpen
    Figure 1. Creating a dot plot of gene intensity values

    A dot plot will be displayed in a new tab (Figure 2).

    Figure 2. Simple dot plot of a single gene that shows the distribution of intensities across all samples

    There are many customizations that can be made to this simple plot.

    • Select Configure Plot () from the plot command bar to launch the Configure Plot dialog (Figure 3).

    Figure 3. Configuring the data shown on the plot

    The Configure Plot dialog lets you change how the data is displayed on the plot. We will make a change to illustrate the possibilities.

    • Set Group by to 4. Tissue using the drop-down menu

    This allows us to group the samples by any categorical attribute. These attributes are specified in the parent spreadsheet.

    • Select OK to modify the plot

    We could also have changed the grouping of samples using the Group by drop-down menu above the plot.

    The order of the group columns is alphabetical by default, but can be changed to match the spreadsheet order by selecting Categoricals in spreadsheet order in the Configure Plot dialog (Figure 3).

    • Select Plot Properties () from the plot command bar to launch the Plot Properties dialog (Figure 4)

    Figure 4. Changing the appearance of a dot plot using the plot properties dialog

    The Plot Properties dialog lets you change the appearance of the plot. We will make a few changes to illustrate the possibilities.

    • Set Shape to 3. Type using the drop-down menu

    • Select the Box&Whiskers tab

    • Set Box Width to 15 pixels

    • Select the Titles tab

    • Set X-Axis under Configure Axes Titles to Tissue

    • Select OK to modify the plot

    Alternately, we chould have changed the shapes using the Shape by drop-down menu above the plot. The dot plot now shows four columns with thinner box and whisker plots for each and different shapes for different sample types (Figure 5).

    Figure 5. The Dot Plot can be modified to optimally visualize your data

    Like many visualizations in Partek Genomics Suite, the dot plot is interactive.

    • Select () to activate Selection Mode

    Legends can now be dragged and dropped to new locations on the plot. Samples can be selected by left-clicking the sample or left-clicking and dragging a box around samples.

    • Select () to activate Zoom Mode

    Left clicking on a region will zoom in on it. The zoom level can be reset by selecting ().

    • After zooming in, select () to activate Pan Mode

    Left-click and drag to move around the plot.

    • Select () to move between rows on the source spreadsheet

    • Select () to swap the horizontal and vertical axes

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    and
    Partek.GeneExpression.GenomeStudio.dll
    , move the
    PartekReport
    folder to

    C:\Program Files (x86)\Illumina\GenomeStudio\Modules\BSGX\ReportPlugins, if there is no ReportPlugins folder in BSGX folder, create one, the path and folder names have to be exactly match one described above (Figure 1).

    Figure 1. Place PartekReport folder in the appropriate direcotry

    hashtag
    Export report from GenomeStudio

    In GenomeStudio gene expression project:

    • Choose Analysis > Reports... from the main menu

    • Select Custom Report and choose Partek Report Plug-in from the drop-down list

    • Specify AnnotationName, do NOT include <> in the name, you can the same name as the .bgx file you imported the data with, or a unique name to your dataset

    • Choose Type by clicking on the cell, default is gene level

    • Leave all the others as default value (Figure 2)

    • Specify the report file name, we recommend to put the exported files in their own folder, which allows you to move the folder instead of all the files individually.

    • Click OK

    Figure 2. Configuring the GenomeStudio gene expression report dialog

    There are five files exported, including a project file (.ppj), which can be opened directory in Partek Genomic Suite. The project file opens the signal intensities data in a spreadsheet and associates the annotation information to the intensity spreadsheet. All intensities are log2 transformed. If there are negative values in the AVG_Signal, the data will be shifted to the lowest value one and then log2 transformed.

    hashtag
    Open project in Partek Genomics Suite

    To open the report, launch Partek Genomics Suite, choose File > Open Project, browse to the .ppj file to open. In the Gene Expression workflow, you can proceed add sample attribute step.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    file-archive
    55KB
    PartekReportGX.zip
    archive
    arrow-up-right-from-squareOpen

    Tasks available for a gene list

    • GO Enrichment

    • Pathway Enrichment

    • Filtering

    hashtag
    GO Enrichment

    The Gene Ontology (GO) Enrichment p-value calculation uses either a Chi-Square or Fisher’s Exact test to compare the genes included in the significant gene list to all possible genes present in the experiment or the background genes. For a microarray experiment, background genes consists of all genes on the chip/array; for a next generation sequencing experiment, all genes in the species transcriptome are considered background genes.

    Because the calculation is essentially comparing overlapping sets of genes and does not use intensity values, GO Enrichment can be performed on an imported gene list even without any numerical values. GO Enrichment is available through the Gene Expression workflow.

    If no annotation file has been specified for the gene list, GO Enrichment will use the full species transcriptome as the background genes. While suitable for next generation sequencing experiments, for microarray experiments, only the genes on the chip/array are appropriate. Please contact our technical support department for assistance with this step if needed.

    hashtag
    Pathway Enrichment

    Like GO Enrichment, Pathway Enrichment does not require numerical values, but instead operates on lists of genes - a list of significant genes vs. background genes. Consequently, Pathway Enrichment may be used with an imported list of genes even without any numerical values. The list of background genes is set to the species transcriptome by default, but can be set to a specific set of genes if the gene list has been associated with an annotation file.

    hashtag
    Filtering

    A gene list can be used to filter another spreadsheet. As an example, we will filter the results of an ANOVA on microarray data using a gene list. This will create a spreadsheet with ANOVA results for only the genes included in our gene list.

    • Open the filtering gene list and target spreadsheets

    • Select the target spreadsheet in the spreadsheet tree, in this example, genes are on rows in ANOVA result spreadsheet

    • Select Filter from the main toolbar

    • Select the matching column of your target spreadsheet from the Key column drop-down menu; here we have selected 4. Gene Symbol (Figure 2)

    • Select the filtering gene list from the Filter based on spreadsheet drop-down menu; here we have selected 1 (Gene List.txt)

    • Select the matching column of your filtering gene list from the Key column drop-down menu; here we have selected 1. Symbol

    • Select OK to apply the filter

    The target spreadsheet will display the filtered rows (Figure 3). Note that the number of rows has gone from 22,283 prior to filtering (Figure 1) to 153 after filtering (Figure 3).

    To use this filtered list for downstream analysis, we can save it.

    • Right-click the open spreadsheet in the spreadsheet tree

    • Select Clone...

    • Use the Clone Spreadsheet dialog to name the new spreadsheet and choose its place in the spreadsheet hierarchy

    The new spreadsheet will open. If you want to use the new spreadsheet again in the future, be sure to save it.

    hashtag
    Applying Multiple Test Correction

    If your imported data contains a list of p-values, you can use any of the available multiple test corrections.

    • Select Stat from the main toolbar

    • Select Multiple Test

    • Select Multiple Test Corrections to launch a dialog with available options (Figure 4), it will add corrected p-value column(s) to the right of the selected p-value column(s)

    hashtag
    Plotting numeric data associated with a gene list

    A variety of profile plots can be used to visualize the numerical data associated with your imported gene list.

    • Select View from the main toolbar

    • Select any applicable option

    hashtag
    Genome Browser

    If you have imported numerical data associated with genes (like p-values or fold-changes), you can visualize these values in the Genome Browser once an annotation file is associated to the spreadsheet, and there is genomic location information in the annotation file.

    • Right-click on a row header in the imported gene list spreadsheet

    • Select Browse to location

    If the annotations have been configured properly, you should see a Regions track for the first column of numerical data, a cytoband track, and an annotation track. You can also add another track to display a second column of numerical data.

    • Select New Track

    • Select Add a track from spreadsheet

    • Select Next >

    A new track titled Regions will be added.

    • Select Regions in the track preferences panel to edit it

    • Select the other numerical column in the Bar height by drop-down menu

    hashtag
    Clustering

    For a gene list with expression values on each sample, clustering can be performed. Access the clustering function through the toolbar, not from a workflow. The workflow implementations assume that the data to be clustered are found on a parent spreadsheet and the list of genes is in a child spreadsheet.

    • Select Tools form the main toolbar

    • Select Discover then Hierarchical Clustering

    Hierarchical Clustering assumes that samples are rows and genes are columns so consider transposing your data if this is not the case. If you have only one column or row of data, cluster only on the dimension with multiple categories by deselecting either Rows or Columns from What to Cluster in the Hierarchical Clustering dialog.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Sort Rows by Prototype

    Sort Rows by Prototype is a function that can identify genes with similar expression patterns. For example, if a gene with an interesting expression pattern has been detected, using Sort Rows by Prototype makes it possible to find other genes that have a similar pattern of intensity values. Although this is most commonly used for changes in gene expression over a time course, it can be applied to other experimental designs as well.

    To invoke Sort Rows by Prototype_,_ probe(sets)/genes must be on rows. If you want to use this tool to analyze the main intensity values spreadsheet, the spreadsheet must be transposed prior to analysis. A common way to view and analyze gene expression in a time-series experiment is to include means or LS means in the ANOVA spreadsheet.

    • Configure the ANOVA dialog to include the factor or interaction of interest

    • Select Advanced... from the ANOVA dialog

    • Select LS-Mean or Mean

    • Use the drop down menus to select the factors or interaction you want the LS mean / mean of

    • Select Add for each

    • Select OK (Figure 1)

    Figure 1. Using Advanced ANOVA setup to include group means in the ANOVA output

    • Select OK to close the ANOVA configuration dialog and open the ANOVA spreadsheet

    The Sort Rows by Prototype function uses every non-text column in a spreadsheet to build and compare patterns; any columns you do not want to include in the pattern similarity analysis need to be removed before running the function.

    If you want to preserve the ANOVA spreadsheet contents, clone the ANOVA spreadsheet prior to deleting columns.

    • Select columns you want to remove

    • Right-click on a selected column headers

    • Select Delete from the pop-up menu

    We can now invoke Sort Rows by Prototype on the modified spreadsheet.

    • Select Tools from the main toolbar

    • Select Discovery

    • Select Sort Rows by Prototype... (Figure 2)

    Figure 2. Invoking Sort Rows by Prototype on spreadsheet with LS mean values for conditions/time points

    The Sort Rows by Prototype dialog will launch (Figure 3).

    Figure 3. Sort Rows by Prototype dialog

    This dialog allows you to configure the pattern, or prototype, that all probe(sets)/genes will be compared to by Sort Rows by Prototype_._

    The Pattern Type options () allow preset shapes to be applied to the prototype within the range specified by the Begin, End, Min, and Max parameters. The final option From Row allows you to select any row number in the spreadsheet to serve as the prototype. This is a useful option if you have a particular gene of interest and want to find other genes with similar expression profiles in your data set. You can also manually configure the prototype by dragging the points.

    The Select Dissimilarity Measure drop-down menu allows to select from a wide variety of parametric and non-parametric measures of dissimilarity.

    • After configuring the prototype and selecting a dissimilarity measure, select Sort to run the function

    • Select Cancel to close the dialog

    A new column 1 will be added to the spreadsheet and the rows will be reordered (Figure 4). The new column contains the dissimilarity score for each row; the lower the value, the more similar the row is to the prototype. The row with the highest similarity to the prototype is listed first, with the other rows listed in descending similarity to the prototype.

    Figure 4. Result of sorting by prototype. The prototype gene is in the first row, while the other genes are listed based on their similarity to the prototype gene. Smaller proximity values imply more similarity to the selected shape

    To view the results, we can generate a profile plot of several of the rows. For example, here we will show the top five most similar probe(sets)/genes.

    • Select the row headers of the top 5 rows by selecting each while holding the Ctrl key or selecting the first then fifth while holding the Shift key

    • Select View from the main toolbar

    • Select Profiles

    The profile plot will open as a new tab (Figure 5).

    Figure 5. Profile plot of 5 probe(sets)/genes most similar to the prototype used in Sort rows by prototype

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Importing a BED file

    A BED (Browser Extensible Data) file is a special case of a region list: it is a tab-delimited text file and the first three columns of BED files contain the chromosome, start, and stop locations. To import a bed file to be used as a data region list, follow the import instructions for region lists. A BED File might also be visualized as an annotation file containing regions in the Genome Browser.

    hashtag
    Using a BED file as an annotation source for the genome browser

    BED files do not contain individual sequences nor do the regions have names. For example, the UCSC Genome Browser has an annotation BED file for CpG islands. You might like to view this information in the context of a methylation microarray data set. Before you can visualize a BED file in the chromosome viewer, you must create a Partek annotation file from the BED file.

    • Select Tools from the main toolbar

    • Select Annotation Manager... (Figure 1)

    Figure 1. Selecting Annotation Manager

    • Select Create Annotation from the My Annotations tab of the Annotation Manager dialog (Figure 2)

    Figure 2. Creating a new annotation file

    • Select BED file (.bed) for Annotation Type (Figure 3)

    Figure 3. Selecting annotation file type

    • Select Browse... under Source to specify the BED file; a default new file name and destination will populate Result, but this can be changed

    • You can specify the name and save location of the new annotation file under Result; we typically choose the Microarray Libraries folder

    • Specify the Name of the annotation database file

    Figure 4. Configuring annotation file creation

    Preview Chromosome Names would be used if the original file had chromosome names that did not match the genome build that had required modification. For our example, this is unnecessary.

    • Select OK to create the annotation

    The Annotation Manager will display the new annotation in the My Annotations tab (Figure 5)

    Figure 5. Viewing created annotation in My Annotations

    hashtag
    Visualizing a BED file as an annotation track in the genome browser

    In order to use a BED file as an Annotation track in the Genome Browser, first create the annotation file as described above, being careful to specify the correct species and genome build.

    • Right-click a row on any spreadsheet that has genomic features on rows (gene lists, ANOVA results, SNP detection)

    • Select either Browse to Row or Browse to Location to invoke the Genome Browser tab

    • Select New Track from the Tracks panel of the Genome Browser (Figure 6)

    Figure 6. Adding a new track to the Genome Viewer

    • Select Add an annotation track with genomic features from a selected annotation source from the Track Wizard dialog (Figure 7)

    Figure 7. Track Wizard dialog

    • Select Next >

    • Choose the annotation file you created; here we have selected UCSC CpG Islands (Figure 8)

    • If your annotation file does not contain strand information for each region, deselect Separate Strands; here we have deselected it

    Figure 8. Choosing the annotation file

    • Select Create

    A new track will be created from the annotation file (Figure 9). If Separate Strands had been selected, there would be two tracks, one for each strand, like we see for the RefSeq Transcripts - 2014-01-03 (+) and (-) tracks (Figure 8).

    Figure 9. Viewing the added annotation file as a track in the genome viewer

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Volcano Plot

    The volcano plot displays p-values and fold-changes of numerous genomic features (e.g., genes or probe sets) at the same time. This allows differentially expressed genes to be quickly identified and saved as a gene list.

    Note: the same list can be generated without a visual aid using the List Manager (ANOVA Streamlined tab).

    We will invoke a volcano plot from an ANOVA results child spreadsheet with genes on rows.

    • Select View from the main toolbar

    • Select Volcano Plot (Figure 1)

    Figure 1. Invoking a volcano plot on an ANOVA results spreadsheet

    The Volcano Plot Configure dialog will open (Figure 2).

    Figure 2. Select the columns to display in the volcano plot

    • Select the fold-change and p-value columns you would like to visualize from the ANOVA results spreadsheet; here we have chosen 12. Fold-Change(Down Syndrome vs. Normal) for the X Axis and 10. p-value(Down Syndrome vs. Normal) for Y Axis

    • Select OK

    The volcano plot will open in a new tab (Figure 3). Control and color options for the volcano plot are largely similar to those described for a . On volcano plots with many probe(sets)/genes, the shapes and sizes of individual probe(sets)/genes will not be visible until they are selected.

    Figure 3. The volcano plot shows each probe(set)/gene as a point. The X Axis shows fold change with no change (N/C) as the mid-point. The Y Axis shows p-values in descending value from a maximum of 1 at the X Axis intersection.

    To facilitate analysis, we can add cutoff lines for both fold-change and p-value.

    • Select ()

    • Select the Axes tab

    • Select Set Cutoff Lines (Figure 4)

    Figure 4. Adding cutoff lines to the volcano plot

    • Set Vertical Line(s) to 1.3 and -1.3

    • Set Horizontal Line(s) to 0.05

    • Select Select all points in a section

    Figure 5. Setting cutoff lines. The vertical lines are fold-change cutoffs. The horizontal line is a p-value cutoff.

    • Select OK to close the Plot Rendering Properties dialog

    The volcano plot now has cutoff lines for fold-change and p-value (Figure 6).

    Figure 6. Cutoff lines facilitate visual analysis of ANOVA results

    Because we selected Select all points in a section when adding the cutoff lines, selecting any of the quandrants will select all probe(sets)/genes in that quadrant. If this option is not selected, individual probe(sets)/genes or groups can be selected using selection mode. Gene lists can be generated from selected probe(sets)/genes.

    If columns are selected in the ANOVA results source spreadsheet for the volcano plot, only those columns will be included in the created list.

    • Select the upper right-hand quadrant of the volcano plot

    • Right click the selected quadrant

    • Select Create List (Figure 7)

    Figure 7. Creating a gene list from a volcano plot

    • Give the new list a name and description as appropriate

    • Select OK

    The list will be saved as a text file and open as a child spreadsheet in the Analysis tab.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Configuring the GO ANOVA Dialog

    The setup dialog for GO ANOVA can be found in the Biological Interpretation section of the expression workflows (Gene Expression, MicroRNA Expression, Exon, RNA-Seq, miRNA-Seq). It is recommended that GO ANOVA is run on the sheet with expression levels, after import and normalization, though GO ANOVA can be run on any spreadsheet with samples on rows and genes on columns. If a child spreadsheet is selected, such as the result of a prior ANOVA analysis, then the test will be automatically run on the parent spreadsheet.

    Upon selecting GO ANOVA (Biological Interpretation > Gene Set Analysis), Partek Genomics Suite will first offer the opportunity to configure the parameters of the test and exclude functional groups with too few or too many genes (Figure 1). To save time when running GO ANOVA, the size of GO categories analyzed can be limited using the Restrict analysis to function groups with fewer than __ genes. Large GO categories may be less interesting and also take the most time to analyze. We recommend to restrict the analysis to the groups with fewer than 150 genes, as it can make the analysis much quicker (and the results easier to interpret). In the current example, the maximum category was set to only 20 genes, for demonstration purposes only.

    Figure 1. Configure the parameters of the test: gene ontology categories with too few or too many genes can be excluded

    Scatter Plot and MA Plot

    hashtag
    Scatter Plot

    A scatter plot is a simple way to visualize differentially expressed genes. We can plot a scatter plot with gene expression values for two samples at one time. While most probe(sets)/genes fall on a 45° line, up- or down-regulated genes are positioned above or below the line.

    To draw a scatter plot, you first need to transpose the original intensities spreadsheet so that the samples are on columns and probe(sets)/genes are on rows.

    Select Filter Rows Based on a List... from Filter Rows (Figure 1)
    Select OK
    Applying Multiple Test Correction
    Plotting numeric data associated with a gene list
    Genome Browser
    Clustering
    our support pagearrow-up-right
    Figure 1. Filtering rows based on a list
    Figure 2. Selecting matching rows from filtering and target spreadsheets
    Figure 3. Filtered spreadsheet. The black bar on the right-hand side of the spreadsheet shows the fraction of filtered-out samples in black vs. the retained samples in yellow.
    Figure 4. Options available for Multiple Test Corrections

    Select the correct Species and Genome Build for the annotation file from the drop-down menus (Figure 4)

    our support pagearrow-up-right

    Select OK (Figure 5)

    dot plot
    our support pagearrow-up-right
    alt text
    Select () from the main command bar to save the modified spreadsheet

    Select Row Profiles

  • Select Select for both Plots and X-Axis in the Configure Data Source dialog

  • our support pagearrow-up-right
    The next dialog (Figure 2) specifies the method of mapping genes to gene sets. Default mapping file is built from annotation files from geneontology.orgarrow-up-right. Custom mapping file points to the mapping files available on the local computer and present in the Microarray libraries directory. Create a new mapping file from the chip's annotation file option will try to build the annotation file from the annotation file created by the microarray vendor. Create a new mapping file from a spreadsheet enables you to create a custom mapping file from an open spreadsheet, which has gene symbols on one column, and gene groups on the other column. Finally, files in gene matrix transposed (GMT) or gene annotation (GA) formats can also be used.

    Figure 2. Setting the method of mapping genes to gene sets

    To setup the GO ANOVA dialogue you must consider all factors that would normally be included in an ANOVA model analyzing gene expression among the samples (Figure 3). Briefly this should include:

    • Experimental factors

    • Factors explaining sample dependence

    • Factors explaining noise

    For more details on ANOVA, see Chapter 11 of the User’s Manual.

    Figure 3. GO ANOVA setup dialog. Including a factor in the ANOVA model (ANOVA Factors) will identify gene ontology (GO) categories whose expression is different across the genes within the category, by the factor of interest. Including a factor as a Disruption Factor will identify GO categories where the expression of the genes within the category are affected but not uniformly across the genes withing the category. Genes (probesets) can be excluded based on expression levels, to reduce the noise.

    hashtag
    Experimental Factors

    Factors inherent to the experiment include variables that would be considered as the experimental variables during experiment design. Generally this will include all variables necessary to answer the questions of the researcher. Examples may include factors such as tissue type, disease state, treatment, or dosage.

    Sometimes factors do not act independently of each other. For example, different dosages of a drug may affect patients differently over time, or a drug may not affect tissues equally as in many toxicity studies. If the effect of one variable on the other is either suspected of occurring, or of particular interest, an interaction between the two factors should be included. To do this, select the two factors simultaneously by CTRL-clicking the factors and then select Add Interaction.

    hashtag
    Factors Explaining Sample Dependence

    Factors to control for sample dependence include variables that account for relation between samples. If tissues are collected in pairs from the same patient, patient ID would be included. Similarly if tissues are collected from two distinct populations, this variable should probably be included as well.

    hashtag
    Factors Explaining "Noise"

    Noise variables may be caused by technical processes used during sample collection and processing. Scan data and dye color are often among these variables.

    hashtag
    Optional Disruption Factor(s)

    Factors included in the GO ANOVA fall into two separate categories: the normal ANOVA factors (middle box) and those interacting with the gene (right-side box).

    Fundamentally, you can run the GO ANOVA with the same parameters used to run a standard ANOVA analysis on gene expression data. (In other words, the middle box of the GO ANOVA is populated exactly as the normal ANOVA and the Interact with Gene box is left empty.) If such an analysis is run, the results would be similar to a standard statistical analysis, except resulting data will report on differential expression of functional categories instead of individual genes. Expression of a functional group is derived from the mean of all genes included within the group. Running GO ANOVA with the same parameters as the differential expression analysis is the most common method of running GO ANOVA. This keeps the analysis much more accessible and the results are easier to interpret.

    There is no need to interact a factor with the gene if such an interaction is not of interest. The right most box in the GO ANOVA setup is optional and may be left empty if this is the case.

    More advanced analysis can include factors, which are interacted with the genes in the GO ANOVA model. After factors are added to the ANOVA factor(s) box, some can be added additionally to the Disruption Factor(s) box. At the mathematical level, this will include the Factor*Gene term in the model, called a Factor-Gene interaction. At the biological level, this will test whether patterns of gene expression within the functional group are being modified as a result of the factor. This altering of gene expression patterns is referred to in this document as the disruption of the functional group.

    For example, if comparing different tissue types, adding tissue to the middle ANOVA factor(s) box, will identify entire GO functional groups that are up or down regulated between tissue types. If comparing nerves and muscles, this might include such categories as myosin binding or actin production, which will be wholly up regulated in muscles as the function is much less important to nerve function.

    By interacting tissue with the gene in the model (adding tissue to the right most box), the interaction p-value may provide a method of discovering categories where total expression might not changed significantly but the pattern of gene expression with the category is altered or disrupted. Within a functional group, the interaction p-value represents how similar the patterns of gene expression are between the different tissues. One example of a functional group identified by a tissue*gene interaction might include a category such as ion transfer. Ion transfer is equally important to both nerve and muscle function, but the distribution of ion channels and many of the responsible genes may be quite different between the two.

    Sometimes factors may be included in the Interact with Gene box even if they are not of specific interest in a similar way that factors to control for noise are added to the ANOVA factors middle box. If any factors are included in Disruption Factor(s) box, to get the most accurate p-values, the more advanced model must fit the data as well as possible. All factors that may alter gene expression patterns should be included. It is important to keep in mind that the GO ANOVA is not only looking for significance in the factors included, but is attempting to generally fit the data. As appropriate factors are added to the model, not only are more aspects of the data analyzed; the model becomes a better fit to the true data and the results will become more accurate.

    To understand how including a Gene*Factor interaction may improve the fit of the model, consider the complex GO ANOVA design in the case of a dose-time analysis of a drug. While it may seem clear that the ANOVA factors in the middle box - dose, time, and the dose*time interaction should be specified (to consider the effect of dose, time, and the change in the effect of dose over time) what to put in the rightmost Gene*Factor box is not as clear. Adding dose alone (which is actually Dose*Gene) will check if different drug doses affect the pattern of gene expression. Similarly adding time into the right box (which is actually Time*Gene) will identify gene ontology categories that are affected in different times but differentially across the genes. While this may be the true limit of questions of interest, including the interactions of the gene and both dose and time may be prudent. In general, if it is likely, or expected, that a factor will affect gene distribution within functional categories, then the factor should be included in the Disruption Factor(s) box if the gene distribution is being analyzed at all.

    To review, including a factor in the middle box will identify GO categories whose expression is consistently affected across the genes within the category by the factor of interest. Including a factor in the right box (factor*gene) will identify gene ontology categories where the expression of the genes within the category are affected but not uniformly across the genes within the category.

    hashtag
    Contrasts

    GO ANOVA is not restricted to analysis of factors with only two levels. The ANOVA p-values are measures of likelihood that all groups are equivalent. While this is useful in general, sometimes tests comparing only two sets of data are more desirable. Using contrasts to define pair wise comparisons in an ANOVA model is superior to using a test that is limited to a two group comparison.

    To specify individual pair wise comparisons, press the Contrast button. Contrasts are performed on groups already defined in the ANOVA model. If two tissue types should be compared to each other, select the tissue term from the Select Factor/Interaction dropdown in the upper left. Select either one or a set of categories and add them to group 1 and group 2. All samples falling into group 1 will be compared to all samples falling into group 2. Output will include not only a p-value, but also a fold change. This fold change will represent the average fold change of the GO category between the two groups. Fold change is calculated as Group 1 divided by Group 2. For data in log space, the data is antilogged as well; fold change output is always for data on a linear scale.

    hashtag
    Excluding Genes

    Check Exclude probe sets and differential expression p-value(s) > to filter out probe sets (=genes) which are not express in any of the genes. The Exclude probe sets option will remove any gene that meets the specified limit. Using the default options, this will remove low expression genes. Note that the default value of 3 is a suggestion for Affymetrix expression arrays and may not be applicable for other data sets. We suggest to perform exploratory analysis and inspect the distribution of the expression values first (e.g. View > Histogram > Row or View > Box and Whiskers > Row). The sub-checkbox, differential expression p-values, provides an override to the low expression limit. Here, a gene will be included in the analysis despite a low expression value if the gene displays a p-value below the specified limit, suggesting that the gene is differentially expressed

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Select the main spreadsheet
  • Select Transform from the main toolbar

  • Select Create Transposed Spreadsheet...

  • Select the column with sample IDs from the drop-down menu

  • Select OK

  • A new temporary spreadsheet will be created with probe(sets)/genes on rows and samples on columns.

    • Select the two sample columns you would like to compare

    • Select View from the main toolbar

    • Select Scatter Plot (Figure 1)

    Figure 1. Invoking a scatter plot from a spreadsheet with probe(sets)/genes on rows and samples on columns

    • Select Yes when asked if you want to only use the selected columns

    • Select Yes when asked if you are sure you would like to draw the scatter plot

    The scatter plot will open in a new tab. We can add a regression line to the plot.

    • Select () from the plot command bar

    • Select Axes

    • Select Set Regression Lines

    • Select Regression line of y on x

    • Set Line Width to 5

    • Select OK (Figure 2)

    Figure 2. Configuring a regression line

    • Select OK to close the Plot Rendering Properties dialog

    The scatter plot now features a regression line dividing the probe(sets)/genes (Figure 3).

    Figure 3. Each dot on the plot represents the intensity value of a probe(set)/gene

    hashtag
    MA Plot

    The MA plot can be used to display a difference in expression patterns between two samples. The horizontal axis (A) shows the average intensity while the vertical axis (M) shows the intensity ratio between the two samples for the same data point. In essence, an MA plot is a scatter plot tilted to the side so that the differentially expressed probe(sets)/genes are located above or below the 0 value of M. An MA plot is also useful to visualize the results of normalization where you would hope to see the median of the values follow a horizontal line.

    The MA plot is invoked on the original intensities spreadsheet with any need for transposition.

    • Select View from the main toolbar

    • Select MA Plot

    The MA plot will launch in a new tab showing the first two rows as the comparison (Figure 4).

    Figure 4. MA plot comparing the expression levels between two samples. Each dot on the plot represents a single genomic feature (gene or probe set). The average signal for each genomic feature is shown on the horizontal axis (A), while the ratio is shown on the vertical axis (M).

    The samples displayed can be changed using the select sample menus on the left-hand side.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Starting with a list of genomic regions

    hashtag
    Importing a region list

    A region list must contain the chromosome, start location, and stop locations as the first three columns. The chromosome number in the region list must be compatible with the genomic annotation for the species if you plan to use any feature (like motif detection) that requires reference sequence information.

    • Import the region list as described above for text files with the following options

      • Select Other for data type

      • Set chromosome as a text field

      • Set location start and stop as either integer or text fields

    • Right-click on the imported spreadsheet in the spreadsheet tree

    • Select Properties

    • Select List of genomic regions from the Configure Spreadsheet dialog to add region to the properties (Figure 1)

    Figure 1. Adding region to the properties of a spreadsheet

    The spreadsheet properties will now include region. Alternatively, region can be added as a spreadsheet property from the Configure Genomic Properties dialog by selecting Advanced.. , choosing region from the drop-down menu, selecting Add, and selecting OK.

    If you would like to do any operation that requires looking up the reference genomic sequence information for the regions based on genomic location, you will need to specify the species for this region list.

    • Right-click on the imported spreadsheet in the spreadsheet tree

    • Select Properties

    • Select species from the Add Property drop-down menu and click Add

    hashtag
    Motif detection

    Starting with a region list, you may detect either known or de novo motifs using the ChIP-Seq workflow if your spreadsheet has been associated with a species and a reference genome.

    • Select ChIP-Seq from the Workflows drop-down menu

    • Select Motif detection from the Peak Analysis section of the workflow

    Both Discover de novo motifs and Search for known motifs can be performed. Motif detection sequence information of the genome, you can specify either .2bit file or .fa file which can be used to create .2bit file

    hashtag
    Determining the average values for a region list

    If you have a region list or a .BED file and you have a microarray experiment with data, you can summarize the microarray data by the genomic coordinates contained in the region list. For example, the region list contains a list of CpG islands, the experiment contains methylation percentage values for probes (β values), and you would like to summarize the methylation values of all probes in each CpG island.

    • Import the region list (or .BED file)

    Be sure that you have added the region property. The list of region coordinates (chromosome, start, stop) from the region list will be mapped against the reference genome specified for the microarray data so specifying Species and Genome Build for your region list is unnecessary.

    • Open the microarray data spreadsheet, this spreadsheet should have annotation file associated to, and there are genomic location information in the annotation file.

    Samples should be on rows and data on columns in the microarray data spreadsheet.

    • Select the region list spreadsheet

    • Right-click any column header in the region list spreadsheet

    • Select Insert Average from the pop-up menu (Figure 2)

    Figure 2. Adding the average values for a region list

    • Select the microarray data spreadsheet containing the values you want to average for each region from the Get average from spreadsheet drop-down menu

    There are three options for averaging the data (Figure 3). Mean of samples significant in region is used when the region list has SampleIDs from the microarray data set associated with each region. In this case, only the microarray data set samples specified for each region would be included in the mean calculation. Mean of all samples will add columns for the mean value of all probes for all samples and the number of probes for all samples in each region. Mean value for all samples separately will add two columns for each sample with the mean value of all probes for that sample and the number of probes for that sample in each region.

    • We have selected Mean value for all samples

    • Select OK (Figure 3)

    Figure 3. Selecting options for adding average values for regions

    Columns will be added to the regions list spreadsheet. Here, we have added two columns with the average β-value for all samples in each CpG island and the number of probes in each CpG island (Figure 4).

    Figure 4. Added average beta values and number of probes per CpG island

    hashtag
    Find region overlaps

    If you have two or more region lists with coordinates on the same reference genome, you can compare them to identify overlapping regions.

    • Open all region list spreadsheets that you want to compare

    • Select Tools from the main toolbar

    • Select Find Region Overlaps (Figure 5)

    Figure 5. Selecting Find Region Overlaps

    The Find Region Overlaps tool has two modes of operation. The first, Report all regions, creates a new spreadsheet with any regions that did not intersect and all regions of intersection between any of the input lists. For each intersection, the start and stop coordinates of the intersection and the percent overlap between the intersected region with each of the regions in the input lists are reported. The second, Only report regions present in all lists creates a new spreadsheet with the intersected regions found in all the lists.

    • Select your preferred mode; we have selected Only report regions present in all lists

    • Select Add New Spreadsheet to add any spreadsheets you want to compare; we are comparing two region list spreadsheets (Figure 6)

    • Select OK

    Figure 6. Configuring Find Overlapping Regions

    A new region list spreadsheet will be created (Figure 7). The new region list is a temporary spreadsheet so be sure to save it if you want to keep it.

    Figure 7. Spreadsheet with regions present in all lists

    hashtag
    Importing a genomic position list for SNV annotation

    To be annotated using the Annotate SNVs tool, an imported SNV position list must have four columns per locus:

    1. Position of the SNP listed as chr.basePosition

    2. Sample ID or name

    3. The reference base

    4. The SNP call (sample genotype base)

    • Prepare input list as shown (Figure 8) with four columns describing the position, sample, reference base, and sample genotype base for each SNV

    Figure 8. An imported SNV list must follow this format to be annotated by the Annotate SNV tool. The first column must be the position and the position must follow the format shown, chr.basePosition

    • Save as either a tab-separated or comma separated file

    • Import the table as a text file

    • Select Genomic data for What type of data is this file?

    The Annotate SNVs tool can now be invoked on this spreadsheet to generate an annotation spreadsheet (Figure 9).

    Figure 9. Annotate SNVs creates a new spreadsheet annotating each SNV from the source list

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    XY Plot / Bar Chart

    The XY plot / bar chart displays the intensity of one probe(set)/gene across two categorical variables. Only one probe(set)/gene may be visualized at a time.

    hashtag
    Invoking from a gene list

    We will invoke an XY plot from a gene list child spreadsheet with genes on rows. The parent spreadsheet should include the categorical variables you want to chart.

    • Right-click on the row header of the gene you want to visualize

    • Select XY Plot (Orig. Data) from the pop-up menu (Figure 1)

    Figure 1. Invoking an XY Plot from a gene list child spreadsheet

    An XY plot will be displayed in a new tab (Figure 2).

    Figure 2. By default, an XY plot invoked from a gene list will have the first categorical variable as columns and the second categorical variable as shapes/colors

    To display the change in gene expression over time for each treatment condition, we need to modify this plot.

    • Select () from the plot command bar

    • Set X-Axis to 3. Time using the drop-down menu

    • Set Separate by to 2. Treatment using the drop-down menu

    To help visualize the connection between time points, we can add connecting lines.

    • Select () from the plot command bar

    • Set Plot Style to lines using the drop-down menu

    • Select OK

    The plot now shows time on the x-axis, plots treatments, and connects treatments across time points with lines (Figure 3). Each point is the LS mean value of all samples with the same values for the two selected categorical variables. The error bars are standard error.

    Figure 3. Modifying the XY plot to enable analysis of gene expression changes in a treatment condition over a time course. In this experiment, only the control was measured at time 0.

    While most of the plot controls are shared with the , XY plot does have a few unique options.

    • Select () to automatically cycle through each row (gene) in the source spreadsheet

    • Select () to stop the cycling

    This feature is useful when performing visual analysis of patterns in gene expression changes in a list of genes.

    The drop-down menu adjacent to the previous/next () controls lets you switch source spreadsheets.

    Lines, but not points, can be selected when using Selection Mode ().

    hashtag
    Invoking from the parent spreadsheet

    It is also possible to invoke an XY plot from the parent spreadsheet using the main toolbar.

    • Select the parent spreadsheet in the spreadsheet tree

    • Select View from the main toolbar

    • Select XY Plot / Bar Chart ...

    The Create XY Plot / Barchart dialog will open (Figure 4).

    Figure 4. Invoking an XY Plot from the main toolbar

    An XY plot will be displayed in a new tab (Figure 5).

    Figure 5. The gene name associated with the probe(set) column is displayed as the chart title by default

    Selecting previous/next () will nagivate along either rows or columns, whichever has probe(set)/gene information.

    To switch this plot from to one of the gene lists we have created, we can use the drop-down menu next to the previous/next controls.

    hashtag
    Bar Chart Plots

    The displayed by a XY plot can instead be displayed as a bar chart with overlayed bars, vertically stacked bars, or horizontally stacked bars. A bar chart can be directly invoked or an XY plot can be converted into a bar chart (and vice versa).

    • Invoke the plot from a gene list using the Bar Chart (Orig. Data) option in the pop-up menu (Figure 1)

    • Invoke the plot from the main toolbar by selecting one of the bar chart options in the Line Style drop-down menu (Figure 4)

    • Invoke the plot as an XY plot, select (), then select one of the bar chart options from the Plot Style drop-down menu in the Plot Rendering Properties dialog (Figure 6)

    Figure 6. An XY Plot can be converted to a Barchart using the Plot Rendering Properties dialog

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Specify the Species Name and Genome Build from the drop-down menus
  • Select OK

  • Set the position column Type to text
  • Set the other columns Type to categorical

  • Select Genomic location instead of marker IDs from the Choose the type of genomic data drop-down menu of the Configure Genomic Properties dialog

  • Specify the Species and Genome Build

  • Select OK

  • our support pagearrow-up-right
    alt text

    Select OK

    dot plotarrow-up-right
    our support pagearrow-up-right

    Export methylation data to Illumina GenomeStudio using Partek report plug-in

    This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Trio/Duo Analysis

    This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    file-pdf
    221KB
    GenomeStudioMethylationPlugin.pdf
    PDF
    arrow-up-right-from-squareOpen
    our support pagearrow-up-right
    file-pdf
    689KB
    Using the Trio Workflow.pdf
    PDF
    arrow-up-right-from-squareOpen
    our support pagearrow-up-right

    Hierarchical Clustering Analysis

    hashtag
    What is Hierarchical Clustering?

    Hierarchical clustering groups similar objects into clusters. To start, each row and/or column is considered a cluster. The two most similar clusters are then combined and this process is iterated until all objects are in the same cluster. Hierarchical clustering displays the resulting hierarchy of the clusters in a tree called a dendrogram. Hierarchical clustering is useful for exploratory analysis because it shows how samples group together based on similarity of features.

    Hierarchical clustering is an unsupervised clustering method. Unsupervised clustering methods do not take the identity or attributes of samples into account when clustering. This means that experimental variables such as treatment, phenotype, tissue, number of expected groups, etc. do not guide or bias cluster building. Supervised clustering methods do consider experimental variables when building clusters.

    hashtag
    Visualizing Hierarchical Clustering

    To illustrate the capabilities and customization options of hierarchical clustering in Partek Genomics Suite, we will explore an example of hierarchical clustering drawn from the tutorial . The data set in this tutorial includes gene expression data from patients with or without Down syndrome. Using this data set, 23 highly differentially expressed genes between Down syndrome and normal patient tissues were identified. These 23 differentially regulated genes were then used to perform hierarchical clustering of the samples. Follow the steps outlined in to perform hierarchical clustering and launch the Hierarchical Clustering tab (Figure 1).

    Figure 1. Heatmap showing results of hierarchical clustering

    The right-hand section of the Hierarchical Clustering tab is a heat map showing relative expression of the genes in the list used to perform clustering. The heat map can be configured using the properties panel on the left-hand side of the tab. In this example, the low expression value is colored in green, the high expression value is in red, and the mid-point value between min and max is colored in black.The dendrograms on the left-hand side and top of the heat map show clustering of samples as rows and features (probes/genes in this example) as columns. Columns are labeled with the gene symbol if there is enough space for every gene to be annotated. Rows are colored based on the groups of the first sample categorical attribute in the source spreadsheet. The sample legend below the heat map indicates which colors correspond to which attribute group. In this example, Down syndrome patient samples are red and normal patient samples are orange.

    The heat map can be configured using the properties panel on the left-hand side of the Hierarchical clustering tab.

    hashtag
    Configuring the Hierarchical Clustering Plot

    hashtag
    Labeling Sample Groups in the Heat Map

    • Select the Rows tab

    • Verify that Type appears in the annotation box

    • Set Width (in pixels) to 25

    This will increase the width of the color box indicating sample Type.

    • Select Show Label

    • Set Text size to 12

    • Set Text angle to 90

    This angle is relative to the x-axis. When set to 90, the text will run along the y-axis.

    • Select Apply

    The sample attributes are now labeled with group titles (Figure 2).

    Figure 2. Labeling heat map with sample attribute groups

    hashtag
    Adding a Sample Attribute to the Heat Map

    • Select the Rows tab

    • Select Tissue from the New Annotation drop-down menu

    • Select Apply

    Color blocks indicating the tissue of each sample have been added to the row labels and sample legend (Figure 3).

    Figure 3. Sample attributes can be added to the heat map as sample labels

    hashtag
    Changing the Orientation of the Rows and Columns

    By default, Partek Genomics Suite displays samples on rows and features on columns. We can transpose the heat map using the Heat Map tab in the plot properties panel.

    • Select the Heat Map tab

    • Select Transpose rows and columns in the Orientation section

    • Select Apply

    The plot has been transposed with samples on columns and features on rows. The label for the sample groups is now in the vertical orientation because the settings we applied to Rows has been applied to Columns.

    • Select the Columns tab

    • Select the Type track

    • Set Text angle to 0

    The sample group label for Type is now visible (Figure 4).

    Figure 4. Heat map columns and rows can be transposed

    hashtag
    Flipping Columns or Rows

    Each cluster node has two sub-cluster branches (legs) except for the bottom level in the dendrogram, the order of the two branches (or legs) is arbitrary, so the two sub-clusters position can be flipped within the cluster. This does not change the clustering, only the position of the clusters on the plot.

    • Select () from the Mouse Mode icon set to activate Flip Mode

    • Clicking on a line (or drawing a bounding box on a line using left mouse button) that represents a sub-cluster branch (or dendrogram leg) will flip the selected leg with the other one leg within the same parent cluster. In this example, clicking on the bottom line will move it to the top of the heat map (Figure 5).

    Figure 5. Rows and columns can be flipped by using Flip Mode to select dendrogram legs

    hashtag
    Changing Heat Map Colors

    The minimum, maximum, and midpoint colors of the heart map intensity plot can be customized.

    • Select the Heat Map tab

    • Set Min color to () using the color picker tool

    • Set Max color to () using the color picker tool

    The heat map and plot intensity legend now show maximum values in yellow and minimum values in light blue with a black midpoint (Figure 6). The data range can also be customized by changing the values of Min and Max.

    Figure 6. Heat map colors for minimum, maximum, and midpoint intensity can be customized

    hashtag
    Zooming to Selected Rows/Columns

    We can use the hierarchical clustering heat map to examine groups of genes that exhibit similar expression patterns. For example, genes that are up-regulated in Down syndrome samples and down-regulated in normal samples.

    • Select () from the Mouse Mode icon set to activate Selection Mode

    • Select on the middle cluster of the rows dendrogram as shown (Figure 7) by clicking on the line or drawing a bounding box around the line

    The lines within the selected cluster will be bold and the corresponding columns (or rows) on the spreadsheet in the analysis tab will be highlighted.

    Figure 7. Selecting a dendrogram cluster using Selection Mode

    • Right-click anywhere in the viewer

    • Select Zoom to Fit Selected Rows

    The same steps can be used to zoom into columns or rows. Here, we have zoomed in on rows, but not columns to show the expression levels of the selected genes for all samples (Figure 8).

    Figure 8. Viewing only selected genes for all samples

    To reset zoom select () on the y-axis to show all rows and the x-axis to show all columns.

    • Select () on the y-axis to show all rows

    • Left click anywhere in the hierarchical clustering plot to deselect the dendrogram

    hashtag
    Exporting a List of Genes From a Selected Cluster

    Partek Genomics Suite can export a list of genes from any cluster selected, allowing large gene sets to be filtered based on the results of hierarchical clustering.

    • Select () from the Mouse Mode icon set to activate Selection Mode

    • Select the bottom cluster of the rows dendrogram

    • Right-click to open the pop-up menu

    Figure 9. Creating gene list from selected cluster

    • Name the gene set down in normal

    • Select OK

    • Save the list as down in normal

    In the Analysis tab, there is now a spreadsheet row_list (down in normal.txt) containing the 6 genes that were in the selected cluster. The same steps can be used to create a list of samples from the hierarchical clustering by selecting clusters on the sample dendrogram.

    hashtag
    Saving Plot Properties

    Once you have created a customized plot, you can save the plot properties as a template for future hierarchical clustering analyses.

    • Select the Save/Load tab

    • Select Save current...

    • Name the current plot properties template; we selected Transposed Blue and Yellow

    The new template now appears in the Save/Load panel as an option. To load a template, select it in the Load/Save panel and select Load selected. Note that all properties, including Min and Max values and sample groups (based on the column number of the attribute in the source spreadsheet) that may not be appropriate for a different data set, will be applied.

    hashtag
    Exporting the Hierarchical Clustering Plot Image

    The hierarchical clustering plot can be exported as a publication quality image.

    • Select the Hierarchical Clustering tab

    • Select File from the main toolbar

    • Select Save Image As... from the drop-down menu

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Violin Plot

    The Violin plot in Partek Genomics Suite is similar to the Profile Trellis plot in that it displays probe(set)/gene intensity values across samples and genes. However, the Violin plot has additional options not shared by the Profile Trellis plot. Here, we will explore one use case for the Violin plot.

    hashtag
    Displaying intensity value ranges for multiple genes grouped by categorical variables

    For this example, we will use the data set and lists created in the Gene Expression tutorial. We have a list of 23 genes that are differentially regulated in tissue samples from patients with Down syndrome and normal controls. We want to display the mean intensity values for Down syndrome and normal samples for each of the 23 genes on a single plot. To do this, we first need to filter the probe intensities spreadsheet to include only the intensity values for the 23 genes of interest.

    With the probe intensities spreadsheet and the gene list open in the Analysis tab, follow these steps to filter the probe intensities spreadsheet.

    • Select the probe intensities spreadsheet in the spreadsheet tree; here, it is Down_Syndrome-GE

    • Select Filter from the main task bar

    • Select Filter Columns

    Figure 1. Invoking filter columns by a list

    The Filter Columns dialog will open (Figure 2).

    Figure 2. Configuring the Filter Columns dialog to filter by probe set ID

    • Select your gene list from the Filter base on spreadsheet drop-down menu; here, we selected Down_Syndrome_vs._Normal

    • Select the column of your gene list that matches the column IDs you want to filter from your probe intensities spreadsheet; here, we selected 2. Probeset ID

    • Select OK to apply the filter

    A black and yellow horizontal bar will appear at the bottom of the spreadsheet. This is the filter indicator showing the proportion of columns (genes/probesets) filtered out (black) and retained (yellow). To continue working with the filtered probeset intensities, we can clone the filtered spreadsheet.

    • Right-click on the filtered probe intensities spreadsheet in the spreadsheet tree

    • Select Clone... from the pop-up menu (Figure 3)

    Figure 3. Cloning a spreadsheet with a filter applied will clone only the retained rows/columns

    • Name the new spreadsheet; we chosen 2

    • Select OK

    The cloned spreadsheet is a temporary file. To ensure we can use it again if we close Partek Genomics Suite, we should save the filtered probe intensities spreadsheet.

    • Select ()

    • Name the new file; we chose Down_Syndrome_vs_Normal_Probe_Intensities

    Now we have a spreadsheet containing only the probe intensity values for our 23 genes of interest (Figure 4).

    Figure 4. Filtered probe intensities spreadsheet

    We can now invoke the Violin plot. Make sure to have the filtered probe intensities spreadsheet selected (in blue) in the spreadsheet tree as shown (Figure 4).

    • Select View from the main taskbar

    • Select Violin Plot from the menu

    A Violin Plot tab will open (Figure 5). This plot shows the intensity value ranges of the 23 genes (probe sets) for all samples as violin plots.

    Figure 5. Viewing violin plots for 23 genes

    • Select View from the main taskbar

    • Select Toggle Properties

    We can now see the plot properties panel to the left of the violin plot (Figure 6).

    Figure 6. The violin plot can be configured using the plot properties panel

    Although it is called the Violin plot, this visualization can also be used to display box and whisker plots, error bar plots, and gradiant plots. For this example, we will generate box and whisker plots, summarized by Type (Down syndrome and normal), for each gene.

    • Select Box and Whisker Plot from the Plot type drop-down menu

    • Select Type from the Summarize by drop-down menu; this can be any categorical variable

    • Select Hide legend from Legend Options

    The modified plot shows box and whisker plots, Down syndrome samples in red and normal in blue, for each gene (Figure 7).

    Figure 7. Viewing average probe intensity values for two groups across 23 genes as box and whisker plots

    To improve our view of the gene symbols, we can modify the X-axis legend.

    • Select X-Axis from the tabs in the plot properties panel

    • Set Text angle to 90 under Labels

    • Uncheck Trucate labels under Labels

    Figure 8. Configuring the X-axis label

    The gene symbol for each column should now be visilble (Figure 9). In cases where probe intensities for your genes of interest fall across a wide range, it may be helpful to normalize the probe intensity distributions of each gene. This is equivalent to what is done to display a heat map of probe intensity values.

    Figure 9. X-axis now labels with gene symbols for each gene

    • Select the Style tab

    • Select Standardize - shift column to mean of zero and scale to standard deviation of one from the Normalization options

    • Select Apply

    The box and whisker plots are now centered with a mean of zero and scaled to have a standard deviation of one (Figure 10). Similar to a heat map, this makes it easier to visualize which genes are upregulated and which are downregulated. Here, we can see that most of the 23 genes are expressed more highly in Down symdrome patients.

    Figure 10. Viewing normalized box and whisker plots

    Plots can also be split by categorical variables. We can use this to visualize differential expression of genes between Down syndrom and normal patients in different tissue types.

    • Select Configure profile

    • Select Switch to Advanced (Figure 11)

    Figure 11. Simple options for configuring profiles in the plot

    • Select Sub-Plot for Tissue (Figure 12)

    Figure 12. Configuring plot properties to split by Tissue

    • Select OK

    Several options will need to be reconfigured before we apply this change.

    • Select Standardize - shift column to mean of zero and scale to standard deviation of one from the Normalization section

    • Select the X-axis tab

    • Set Text Angle to 90

    There should now be a sub-plot for each category, in this case there are four sub-plots, one for each tissue (Figure 13). There are no error bars for several plots because there are not enough samples in those categories.

    Figure 13. Splitting a plot by a categorical factor, Tissue, and grouping by another categorical variable, Type

    These sub-plots can be displayed all together, or individually.

    • Select 1 from the Items/Page drop-down menu

    You can now move through the sub-plots by selecting Next >.

    • Select All from the Items/Page drop-down menu to return to the 2x2 view

    This data can also be displayed as a gradient plot (Figure 14) or error bar plot (Figure 15) by changing the Plot type using the drop-down menu in the Style tab. By default, the shading range in the gradiant plot and the error bars show +/-1 standard deviation from the mean.

    Figure 14. Gradient plot

    Figure 15. Error bar plot

    The final option, violin plot, cannot be used to display samples grouped by a categorical variable. To view a violin plot, we must remove the Summarize by selection.

    • Select (One profile per sub-plot) from the Summarize by drop-down menu

    • Select Violin plot from the Plot type drop-down menu

    • Select None - do not adjust values for Normalization

    The plot now displays violin plots for each gene showing the distribution of probe intensity values for each tissue in a separate sub-plot (Figure 16).

    Figure 16. Violin plots for each gene, sub-plots for each tissue

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Select Apply
    Select Apply
    Select Create Row List... (Figure 9)
    Select a destination and name for the file
  • Select PNG or your preferred image type from the pull-down menu

  • Select Save

  • Gene Expression Analysis
    Performing hierarchical clustering
    our support pagearrow-up-right
    Select Filter Columns Base on a List... (Figure 1)

    Select Apply to modify the plot

    Uncheck Show Outline under Blocks

  • Uncheck Columns under Attributes

  • Select Apply (Figure 8)

  • Deselect Truncate labels

  • Deselect Show outline

  • Deselect Columns

  • Select Apply

  • Select Apply

    our support pagearrow-up-right

    Annotation

    hashtag
    Associating a Spreadsheet with an Annotation File

    For Partek Genomics Suite to recognize an annotation spreadsheet, it must meet several requirements. First, there must be a column header row in the annotation file. Second, there must be a column in the annotation file that matches the identifiers in your data spreadsheet. Third, any text field above the column header row must start with #. Fourth, the text fields must be tab or comma delimited.

    We will illustrate associating a spreadsheet with an annotation file using an imported .txt data file from an Illumina HumanHT-12 v4.0 Gene Expression BeadChip array and the HumanHT-12 v4.0 Whole-Genome Manifest File (TXT Format) from Illumina.

    • Open the annotation file with a text editor such as Notepad++/WordPad/TextEdit (Microsoft Excel is not recommended to edit text files, for instance when used default settings, it converts gene names to dates and floating-point numbers)

    Microsoft Excel is not recommended for viewing text files because on default settings it converts some gene names to dates and others to floating-point numbers

    • Verify that a column in the annotation file matches the identifier in your data spreadsheet, e.g probe ID, the identifier must be unique to each row

    • Remove the text before the first column header (Figure 1) or add # to each text box

    • Save the annotation file as a .txt file

    Figure 1. The HumanHT-12 v4.0 Gene Expression BeadChip annotation file contains several rows of information prior to the column header row. To use this annotation file in Partek Genomics Suite, we delete any rows prior to the column headers row.

    • Right-click the spreadsheet you want to annotate in the spreadsheet tree panel, select Properties from the pop-up menu (Figure 2) or select Properties from the File menu on the main toolbar

    Figure 2. Changing the spreadsheet properties

    Depending on how you imported the data, you may see a Configure Spreadsheet dialog (Figure 3). Select the most appropriate option for your data; here we have chosen Genomic microarray.

    Figure 3. The Configure Spreadsheet dialog may appear depending on how you imported your data

    The Configure Genomic Properties dialog will now open.

    • Select the appropriate option for Choose the type of genomic data; here we have chosen Gene Expression (Figure 4).

    Figure 4. Selecting the type of genomic data

    • Select the appropriate options for Location of genomic features in spreadsheet

    Selecting Gene Symbol instead of Marker ID allows biological interpretation tasks like GO Enrichment or Pathway Enrichment to be performed without an annotation file because the gene symbol can be used to look up the gene set or pathway database.

    Location of genomic features in spreadsheet allows you to specify whether genomic features (e.g. genes, miRNAs, probes, SNPs, CpGs etc) are represented by columns or rows. For Feature in column label, each feature is on a column, each row is a sample. For Feature in column, each feature is on a row and the feature ID for each feature is located in the column chosen with the drop-down menu.

    Choose chips/reference and annotation files allows you to specify an annotation file to associate with the spreadsheet.

    • Select Browse... from Choose chips/references and annotation files

    • Select your annotation spreadsheet file using the file selection interface

    If the genomic position information from the annotation file cannot be automatically parsed, the Configure Annotation dialog will launch. This dialog allows you to choose which columns in the annotation file give the identity and genomic location of the features in your data spreadsheet. There are four options depending on if and how chromosome coordinates are described in the annotation file.

    • Select the appropriate option for your annotation file; we have selected Chromosome is in one column and the physical position is in another column (eg: chr1, 100 or chr1, 100-200)

    The Choose the columns section displays the annotation file spreadsheet with options to choose which columns are the Marker ID,Chromosome, and Physical Position (Figure 4).

    • Select the column that matches the feature IDs in your data spreadsheet for _Marker ID; w_e have chosen Probe_Id for Marker ID.

    • Select the column(s) that matches the chromosome location data; we have chosen Chromosome for Chromosome and Probe_Coordinates for Physical Position.

    • Select Close to return to the Configure Genomic Properties

    An index file for the genomic location data of the annotation file is generated in the same folder as the annotation file; it has the same file name as the annotation file, but the file extension .idx. If you need to re-configure the genomic location field in the annotation file, first manually delete the .idx file and re-do the above steps to generate a new index file for the annotation file.

    Figure 5. Specifying the columns that contain the genomic locations of markers in the annotation file

    The Chip/Reference text field will be populated with the annotation file name. You can edit this text field this if you wish.

    For the Annotation column with gene symbols or miRNA names section, if Gene symbol instead of Marker ID is selected, this field is used automatically populated with the gene symbol column; however, if it is not selected, you will need to manually specify the column in the annotation file that corresponds with gene symbols or miRNA names.

    • Select Set Column:

    • Select the appropriate column from the dialog; here we have selected ILMN_gene (Figure 5)

    • Select OK

    Figure 6. Choosing the annotation column with gene symbols

    Species and gene symbol information is required for biological interpretation analysis.

    • Select the correct species and genome build from the drop-down menus; we have chosen Homo sapiens and hg19 (Figure 6)

    • Select OK apply the annotation file to your data spreadsheet

    Figure 7. Choosing annotation file using the Configure Genomic Properties dialog

    To verify that the annotation has been added, we can try to add annotation information to the spreadsheet when the feature are on rows in the spreadsheet.

    • Right-click on a column in the annotated data file spreadsheet

    • Select Insert Annotation from the pop-up menu (Figure 5)

    Figure 8. Adding an annotation column to data spreadsheet

    The Column Configuration section of the Add Rows/Columns to Spreadsheet dialog should contain all the feature annotations from the annotation file spreadsheet (Figure 6). Here we selected ILMN_Gene, which will add gene name information as a column next to 1. ID_REF.

    Figure 9. Annotations from the annotation spreadsheet file should appear as options in the Column Configuration section of the Add Rows/Columns to Spreadsheet

    hashtag
    Building an Annotation File

    Annotation files for most commercial arrays are available from the chip manufacturer. If you have a custom chip or want to use a customized annotation file, you can create an annotation file that will allow you to add annotations to your features (e.g. probe IDs) when the features are represented by rows on the spreadsheet. Your annotation file must meet the following criteria:

    • The annotation file must have a column header with a label for each column

    • A column in the annotation file must correspond to the feature ID column of your data spreadsheet

    • Any comments before the header must start with # or the header will not be recognized

    To invoke a genome view of your data, your annotation file must also have one or more columns that contain the genomic location in a format that Partek Genomics Suite can recognize. The annotation file must also contain a column that has the chromosome and base pair location (start and stop or physical position). Cytoband and/or strand can also be included.

    The table below provides possible column labels, a description of the format for that field, and an example.

    Column label
    Description of format
    Example

    Here are a few examples of the first two rows of annotation files:

    • Using Agilent format

    ProbeID
    GeneName
    GenomicCoordinates
    Cytoband
    • Using Affymetrix SNPs format

    Probe Set ID
    Chromosome
    Physical Position
    Strand
    Cytoband
    • Using Affymetrix exons format

    probeset_id
    seqname
    strand
    start
    stop

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    The fields of the annotation file must be tab or comma delimited

    strand

    + for top, - for bottom

    +

    physical position

    an integer, the position (in base pairs) of the feature

    70100176

    chromosome

    a chromosome label

    3

    start

    an integer, the start position (in base pairs) of the feature

    69871322

    stop

    an integer, the stop position (in base pairs) of the feature

    70100176

    genomic_coordinates

    chromosome:start-stop

    A_44_P1025812

    TC521361

    chr12:2546883-2546824

    rn

    SNP_A-1512540

    9

    22205296

    -

    p21.3

    2315588

    chr1

    +

    1155398

    1155624

    our support pagearrow-up-right

    3:69871322-70100176