Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Scientists often develop lists of genes, probes, transcripts, SNPs, and genomic regions of interest from analysis tools, research papers, and databases. Using Partek Genomics Suite, these lists can be integrated with genomics data sets, analyzed with powerful statistics, and visualized for new insights.
This user guide will illustrate:
This user guide does not discuss every operation that can be performed on an imported list of regions, SNPs, or genes. If there is some other feature in Partek Genomics Suite that you would like to apply to an imported list, please contact the technical support team for additional guidance. If you have found a novel use of a Partek Genomics Suite feature on an imported list that you think should be included in this user guide, please let us know.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The Gene Ontology (GO) Enrichment p-value calculation uses either a Chi-Square or Fisher’s Exact test to compare the genes included in the significant gene list to all possible genes present in the experiment or the background genes. For a microarray experiment, background genes consists of all genes on the chip/array; for a next generation sequencing experiment, all genes in the species transcriptome are considered background genes.
Because the calculation is essentially comparing overlapping sets of genes and does not use intensity values, GO Enrichment can be performed on an imported gene list even without any numerical values. GO Enrichment is available through the Gene Expression workflow.
If no annotation file has been specified for the gene list, GO Enrichment will use the full species transcriptome as the background genes. While suitable for next generation sequencing experiments, for microarray experiments, only the genes on the chip/array are appropriate. Please contact our technical support department for assistance with this step if needed.
Like GO Enrichment, Pathway Enrichment does not require numerical values, but instead operates on lists of genes - a list of significant genes vs. background genes. Consequently, Pathway Enrichment may be used with an imported list of genes even without any numerical values. The list of background genes is set to the species transcriptome by default, but can be set to a specific set of genes if the gene list has been associated with an annotation file.
A gene list can be used to filter another spreadsheet. As an example, we will filter the results of an ANOVA on microarray data using a gene list. This will create a spreadsheet with ANOVA results for only the genes included in our gene list.
Open the filtering gene list and target spreadsheets
Select the target spreadsheet in the spreadsheet tree, in this example, genes are on rows in ANOVA result spreadsheet
Select Filter from the main toolbar
Select Filter Rows Based on a List... from Filter Rows (Figure 1)
Select the matching column of your target spreadsheet from the Key column drop-down menu; here we have selected 4. Gene Symbol (Figure 2)
Select the filtering gene list from the Filter based on spreadsheet drop-down menu; here we have selected 1 (Gene List.txt)
Select the matching column of your filtering gene list from the Key column drop-down menu; here we have selected 1. Symbol
Select OK to apply the filter
The target spreadsheet will display the filtered rows (Figure 3). Note that the number of rows has gone from 22,283 prior to filtering (Figure 1) to 153 after filtering (Figure 3).
To use this filtered list for downstream analysis, we can save it.
Right-click the open spreadsheet in the spreadsheet tree
Select Clone...
Use the Clone Spreadsheet dialog to name the new spreadsheet and choose its place in the spreadsheet hierarchy
Select OK
The new spreadsheet will open. If you want to use the new spreadsheet again in the future, be sure to save it.
If your imported data contains a list of p-values, you can use any of the available multiple test corrections.
Select Stat from the main toolbar
Select Multiple Test
Select Multiple Test Corrections to launch a dialog with available options (Figure 4), it will add corrected p-value column(s) to the right of the selected p-value column(s)
A variety of profile plots can be used to visualize the numerical data associated with your imported gene list.
Select View from the main toolbar
Select any applicable option
If you have imported numerical data associated with genes (like p-values or fold-changes), you can visualize these values in the Genome Browser once an annotation file is associated to the spreadsheet, and there is genomic location information in the annotation file.
Right-click on a row header in the imported gene list spreadsheet
Select Browse to location
If the annotations have been configured properly, you should see a Regions track for the first column of numerical data, a cytoband track, and an annotation track. You can also add another track to display a second column of numerical data.
Select New Track
Select Add a track from spreadsheet
Select Next >
A new track titled Regions will be added.
Select Regions in the track preferences panel to edit it
Select the other numerical column in the Bar height by drop-down menu
For a gene list with expression values on each sample, clustering can be performed. Access the clustering function through the toolbar, not from a workflow. The workflow implementations assume that the data to be clustered are found on a parent spreadsheet and the list of genes is in a child spreadsheet.
Select Tools form the main toolbar
Select Discover then Hierarchical Clustering
Hierarchical Clustering assumes that samples are rows and genes are columns so consider transposing your data if this is not the case. If you have only one column or row of data, cluster only on the dimension with multiple categories by deselecting either Rows or Columns from What to Cluster in the Hierarchical Clustering dialog.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The preferred method for importing a generic data spreadsheet into Partek Genomics Suite is as a text file. Here, we illustrate importing a list of genes with p-value and fold-change from an experiment comparing two conditions.
Select File from the main toolbar
Select Import
Select Text (.csv .txt)...
Select the text file using the file browser to launch the Import .txt, .tsv, or .csv File dialog
The File Type section of the Import dialog includes a preview of the text file and import options (Figure 1).
The columns in the import file can be separate by a tab, comma, or any other character.
For most applications, the items on the list should be in rows while attributes or values should be in columns. If a list is oriented with items on columns, select Transpose the file to to import a transposed spreadsheet.
Select Next > to move to the Data Type section
Select your data type; here we have chosen Genomic Data because it is a gene list (Figure 2)
We have also deselected Is the data log transformed (LOG_base (x+offset) ) ?
Selecting Genomic Data will open a dialog after import to configure properties for the imported list including selecting the type of genomic data, the location of genomic features in the spreadsheet, the annotation column with gene symbols, the chip or reference source and annotation file, the species, and reference genome build.
Select Next >
The next step is to identify where the data starts and where the optional header is found using Identify Column Labels, Start of Data (Figure 3). The line that contains the header (if present) must precede the data. If there are lines to be skipped in the file (like comments), they may only appear at the top of the file, before the header line or data begin.
If there are many comment lines at the start of the file, you may need to select View Next 5 Records to get to the row that contains the column header. If you accidentally move past the screen that contains the header or data rows, select View Previous 5 Records.
If there are missing numerical values or empty cells in your input list, insert a special character or symbol (?, N/A, NA, etc.) in the missing cells; you will specify the character in the Missing Data Representation section of the dialog, only one symbol can be used to represent missing values, the default missing value indicator is ?.
If a header row is present, select Col Lbls to allow you to select a column header row
Select the row where the data beings using the Begin Data selector
If any cells have a missing value, you can signify this with a special symbol selected using the Missing Data Representation panel
Select Next >
The Preview text encoding section (Figure 4) previews the first five lines of the file, allowing you to check if the text encoding is correct.
If the text does not appear properly, use the Specify the text encoding: drop-down menu to choose the correct encoding
Select Next >
The final section of the Import .txt, .tsv, or .csv File dialog is Verify Type & Attribute of Data Columns (Figure 5). While data column type and attribute can be modified after import, it is easier and faster to select the proper options during import as multiple columns may be selected during this dialog.
Check and modify column types and attributes
If there is an identifier like gene symbol or SNP, the Type field for that column should be set to text and Attribute should be set to label. Numeric values (intensities, p-values, fold-changes, etc.) should have Type set to double and Attribute set to response. The other possible value for Attribute is factor and describes sample data. The user interface is this dialog allows you to select multiple columns at once (Ctrl+left click and Shift+left click). The interface controls are detailed in the dialog (Figure 5).
Select Finish to import the text file and open it as a spreadsheet
If Genomic data was selected in the Data Type section, the Configure Genomic Properties dialog will open (Figure 6). These options will be discussed in the next section when we add an annotation file.
Select OK
The imported spreadsheet will open (Figure 7).
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
There are many useful visualizations, annotations, and biological interpretation tools that can operate on a gene list. In order for these features work with an imported list, an annotation file must be associated with the gene list. Additionally, many operations that work with a list of significant genes (like GO- or Pathway-Enrichment) require comparison against a background of “non-significant” genes. The quickest way to accomplish both is to use the background of “all genes” for that organism provided by an annotation source like RefSeq, Ensembl, etc. in .pannot (Partek annotation), .gff, .gtf, .bed, tab- or comma-delimited format. If the file is not already in a tab-separated or comma delimited format, you may import, modify, and save the file in the proper file format.
Select File from the main toolbar
Select Genomic Database under Import (Figure 1)
Select the annotation file; in this example, we select a .pannot file downloaded from Partek distributed library file repository – hg19_refseq_14_01_03_v2.pannot
Delete or rearrange the columns as necessary; we have placed the column with identifiers (should be unique ID) that correspond to our gene list first
Select File then Save As Text File... to save the annotation file; we have named it Annotation File (Figure 2)
Now we can add the annotation file to our imported gene list.
Right click 1 (gene_list.txt) in the spreadsheet tree
Select Properties from the pop-up menu
This brings up the Configure Genomic Properties dialog (Figure 3).
Select Browse under Annotation File
Choose the annotation file; we have chosen Annotation File.txt
If this is the first time you have used an annotation, the Configure Annotation dialog will launch. This is used to choose the columns with the chromosome number and position information for each feature. Our example annotation file has chromosome, start, and stop in separate columns.
Select the proper column configuration options (Figure 4)
Select Close to return to the Configure Genomic Properties dialog
Select Set Column: to open the Choose column with gene symbols or microRNA names dialog (Figure 5)
Select the appropriate column; here the default choice of 1. Symbol is appropriate
Select OK to return to the Configure Genomic Properties dialog
Select the appropriate species and genome build options; we have selected Homo sapiens and hg19 (Figure 6)
Select OK
The annotation file has been associated with the spreadsheet and additional tasks can now be performed on the data, e.g. since the annotation has genomic location, you can draw chromosome view on this data.
If an annotation file has been associated with a spreadsheet, annotations from the file can be added as columns in the spreadsheet when each identifier is on a row.
Right click on a column header
Select Insert Annotation
Select columns to add from Column Configuration; we have selected Chromosome, Start, and Stop (Figure 7)
Select OK
A list of SNPs using dbSNP IDs can be imported as a text file and associated with an annotation file as described for a list of genes. The annotation file you use to annotate the SNPs should minimally contain the chromosome number and physical position of each locus.
Novel SNPs or SNPs that are not found in your annotation source must be imported as a region list. For this, follow the procedure outlined in , but use the SNP name in place of a region name.
Starting with a list of SNPs that have been associated with genomic loci using an annotation file and assigned a species with genome build, you can use Find Overlapping Genes to annotate these SNPs with the closest genes.
Select Tools from the main toolbar
Select Find Overlapping Genes (Figure 1)
Figure 1. Adding overlapping genes to a SNP list
Select Add a New Column with the Gene Nearest to the Region from the method dialog
The Report Regions from the specified database dialog will open.
Select your preferred database. Be sure to match the species and genome build of your SNP list
Select OK
This will add 3 columns to the list of SNPs spreadsheet including Nearest Feature, which will indicate the nearest gene and strand (Figure 2).
Figure 2. Find Overlapping Genes adds three columns to a SNP list: overlapping features, nearest feature, and distance to nearest feature (bps)
To allow gene list operations such as GO Enrichment or Pathway Enrichment to be performed on the SNP list, we can set the Nearest Feature column as the gene symbol column for the spreadsheet.
Right click the spreadsheet in the spreadsheet tree
Select Properties from the pop-up menu
Select Gene symbol instead of Marker ID
Select Feature in column and select Nearest Feature (Figure 3)
Select OK
Figure 3. Setting Nearest Feature as the gene symbol allows gene list functions to be performed on a SNP list
If you have a SNP spreadsheet that was generated using Partek Genomics Suite (not imported as a .txt file), you can annotate the SNP list with gene, transcript, exon, and information about the predicted effect of the SNPs.
Select Tools from the main command toolbar
Select Annotate SNVs
A region list must contain the chromosome, start location, and stop locations as the first three columns. The chromosome number in the region list must be compatible with the genomic annotation for the species if you plan to use any feature (like motif detection) that requires reference sequence information.
Import the region list as described above for text files with the following options
Select Other for data type
Set chromosome as a text field
Set location start and stop as either integer or text fields
Right-click on the imported spreadsheet in the spreadsheet tree
Select Properties
Select List of genomic regions from the Configure Spreadsheet dialog to add region to the properties (Figure 1)
Figure 1. Adding region to the properties of a spreadsheet
The spreadsheet properties will now include region. Alternatively, region can be added as a spreadsheet property from the Configure Genomic Properties dialog by selecting Advanced.. , choosing region from the drop-down menu, selecting Add, and selecting OK.
If you would like to do any operation that requires looking up the reference genomic sequence information for the regions based on genomic location, you will need to specify the species for this region list.
Right-click on the imported spreadsheet in the spreadsheet tree
Select Properties
Select species from the Add Property drop-down menu and click Add
Specify the Species Name and Genome Build from the drop-down menus
Select OK
Starting with a region list, you may detect either known or de novo motifs using the ChIP-Seq workflow if your spreadsheet has been associated with a species and a reference genome.
Select ChIP-Seq from the Workflows drop-down menu
Select Motif detection from the Peak Analysis section of the workflow
Both Discover de novo motifs and Search for known motifs can be performed. Motif detection sequence information of the genome, you can specify either .2bit file or .fa file which can be used to create .2bit file
If you have a region list or a .BED file and you have a microarray experiment with data, you can summarize the microarray data by the genomic coordinates contained in the region list. For example, the region list contains a list of CpG islands, the experiment contains methylation percentage values for probes (β values), and you would like to summarize the methylation values of all probes in each CpG island.
Import the region list (or .BED file)
Be sure that you have added the region property. The list of region coordinates (chromosome, start, stop) from the region list will be mapped against the reference genome specified for the microarray data so specifying Species and Genome Build for your region list is unnecessary.
Open the microarray data spreadsheet, this spreadsheet should have annotation file associated to, and there are genomic location information in the annotation file.
Samples should be on rows and data on columns in the microarray data spreadsheet.
Select the region list spreadsheet
Right-click any column header in the region list spreadsheet
Select Insert Average from the pop-up menu (Figure 2)
Figure 2. Adding the average values for a region list
Select the microarray data spreadsheet containing the values you want to average for each region from the Get average from spreadsheet drop-down menu
There are three options for averaging the data (Figure 3). Mean of samples significant in region is used when the region list has SampleIDs from the microarray data set associated with each region. In this case, only the microarray data set samples specified for each region would be included in the mean calculation. Mean of all samples will add columns for the mean value of all probes for all samples and the number of probes for all samples in each region. Mean value for all samples separately will add two columns for each sample with the mean value of all probes for that sample and the number of probes for that sample in each region.
We have selected Mean value for all samples
Select OK (Figure 3)
Figure 3. Selecting options for adding average values for regions
Columns will be added to the regions list spreadsheet. Here, we have added two columns with the average β-value for all samples in each CpG island and the number of probes in each CpG island (Figure 4).
Figure 4. Added average beta values and number of probes per CpG island
If you have two or more region lists with coordinates on the same reference genome, you can compare them to identify overlapping regions.
Open all region list spreadsheets that you want to compare
Select Tools from the main toolbar
Select Find Region Overlaps (Figure 5)
Figure 5. Selecting Find Region Overlaps
The Find Region Overlaps tool has two modes of operation. The first, Report all regions, creates a new spreadsheet with any regions that did not intersect and all regions of intersection between any of the input lists. For each intersection, the start and stop coordinates of the intersection and the percent overlap between the intersected region with each of the regions in the input lists are reported. The second, Only report regions present in all lists creates a new spreadsheet with the intersected regions found in all the lists.
Select your preferred mode; we have selected Only report regions present in all lists
Select Add New Spreadsheet to add any spreadsheets you want to compare; we are comparing two region list spreadsheets (Figure 6)
Select OK
Figure 6. Configuring Find Overlapping Regions
A new region list spreadsheet will be created (Figure 7). The new region list is a temporary spreadsheet so be sure to save it if you want to keep it.
Figure 7. Spreadsheet with regions present in all lists
To be annotated using the Annotate SNVs tool, an imported SNV position list must have four columns per locus:
Position of the SNP listed as chr.basePosition
Sample ID or name
The reference base
The SNP call (sample genotype base)
Prepare input list as shown (Figure 8) with four columns describing the position, sample, reference base, and sample genotype base for each SNV
Figure 8. An imported SNV list must follow this format to be annotated by the Annotate SNV tool. The first column must be the position and the position must follow the format shown, chr.basePosition
Save as either a tab-separated or comma separated file
Import the table as a text file
Select Genomic data for What type of data is this file?
Set the position column Type to text
Set the other columns Type to categorical
Select Genomic location instead of marker IDs from the Choose the type of genomic data drop-down menu of the Configure Genomic Properties dialog
Specify the Species and Genome Build
Select OK
The Annotate SNVs tool can now be invoked on this spreadsheet to generate an annotation spreadsheet (Figure 9).
Figure 9. Annotate SNVs creates a new spreadsheet annotating each SNV from the source list
A BED (Browser Extensible Data) file is a special case of a region list: it is a tab-delimited text file and the first three columns of BED files contain the chromosome, start, and stop locations. To import a bed file to be used as a data region list, follow the import instructions for region lists. A BED File might also be visualized as an annotation file containing regions in the Genome Browser.
BED files do not contain individual sequences nor do the regions have names. For example, the UCSC Genome Browser has an annotation BED file for CpG islands. You might like to view this information in the context of a methylation microarray data set. Before you can visualize a BED file in the chromosome viewer, you must create a Partek annotation file from the BED file.
Select Tools from the main toolbar
Select Annotation Manager... (Figure 1)
Figure 1. Selecting Annotation Manager
Select Create Annotation from the My Annotations tab of the Annotation Manager dialog (Figure 2)
Figure 2. Creating a new annotation file
Select BED file (.bed) for Annotation Type (Figure 3)
Figure 3. Selecting annotation file type
Select Browse... under Source to specify the BED file; a default new file name and destination will populate Result, but this can be changed
You can specify the name and save location of the new annotation file under Result; we typically choose the Microarray Libraries folder
Specify the Name of the annotation database file
Select the correct Species and Genome Build for the annotation file from the drop-down menus (Figure 4)
Figure 4. Configuring annotation file creation
Preview Chromosome Names would be used if the original file had chromosome names that did not match the genome build that had required modification. For our example, this is unnecessary.
Select OK to create the annotation
The Annotation Manager will display the new annotation in the My Annotations tab (Figure 5)
Figure 5. Viewing created annotation in My Annotations
In order to use a BED file as an Annotation track in the Genome Browser, first create the annotation file as described above, being careful to specify the correct species and genome build.
Right-click a row on any spreadsheet that has genomic features on rows (gene lists, ANOVA results, SNP detection)
Select either Browse to Row or Browse to Location to invoke the Genome Browser tab
Select New Track from the Tracks panel of the Genome Browser (Figure 6)
Figure 6. Adding a new track to the Genome Viewer
Select Add an annotation track with genomic features from a selected annotation source from the Track Wizard dialog (Figure 7)
Figure 7. Track Wizard dialog
Select Next >
Choose the annotation file you created; here we have selected UCSC CpG Islands (Figure 8)
If your annotation file does not contain strand information for each region, deselect Separate Strands; here we have deselected it
Figure 8. Choosing the annotation file
Select Create
A new track will be created from the annotation file (Figure 9). If Separate Strands had been selected, there would be two tracks, one for each strand, like we see for the RefSeq Transcripts - 2014-01-03 (+) and (-) tracks (Figure 8).
Figure 9. Viewing the added annotation file as a track in the genome viewer
Select () to close the annotation file
Select () to save the spreadsheet
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.
As these features require intensity (or count) data as well as experimental groups, these features cannot be performed on an imported lists.
If the data from imported spreadsheets has been associated with annotations, several integration approaches may be used to integrate multiple kinds of imported data.
The Genome Browser may be used to display data from multiple spreadsheets/experiments regardless of the type of spreadsheets (imported data or microarray or NGS experiments).
The Venn Diagram tool may be used to find overlaps based on a feature name.
The Find Overlapping Regions tool can use an imported gene list and a list of regions from a copy number or ChIP-Seq experiment to identify genomic regions in common.
This User Guide did not discuss every operation that can be performed on an imported list of regions, SNPs, or genes. If there is some other feature that you would like to apply to an imported list, please contact the technical support team for additional guidance. If you have found a novel use of a feature on an imported list that you think should be included in this User Guide, please let us know.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.