Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
RNA-Seq is a high-throughput sequencing technology used to generate information about a sample’s RNA content. Partek Genomics Suite offers convenient visualization and analysis of the high volumes of data generated by RNA-Seq experiments.
This tutorial illustrates:
Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on Our support page to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.
In this tutorial, you will analyze an RNA-Seq experiment using the Partek Genomics Suite software RNA-Seq workflow. The data used in this tutorial was generated from mRNA extracted from four diverse human tissues (skeletal muscle, brain, heart, and liver) from different donors and sequenced on the Illumina® Genome Analyzer™. The single-end mRNA-Seq reads were mapped to the human genome (hg19), allowing up to two mismatches, using Partek Flow alignment and the default alignment options. The output files of Partek Flow are BAM files which can be imported directly into Partek Genomics Suite 7.0 software. BAM or SAM files from other alignment programs like ELAND (CASAVA), Bowtie, BWA, or TopHat are also supported. This same workflow will also work for aligned reads from any sequencing platform in the (aligned) BAM or SAM file formats.
Data and associated files for this tutorial can be downloaded by going to Help > On-line Tutorials from the Partek Genomics Suite main menu or using this link - RNA-Seq Data Analysis tutorial files. Once the zipped data directory has been downloaded to your local drive:
Unzip the downloaded files to C:\Partek Training Data\RNA-seq or to a directory of your choosing. Be sure to create a directory or folder to hold the contents of the zip file
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
During import, you created a categorical attribute called Tissue and assigned the 4 samples to either the muscle or not muscle groups. This step was to create replicates within a group, albeit this grouping is somewhat artificial and is only used in this tutorial because we want to illustrate ANOVA with a small data set. Replicates are a prerequisite for differential expression analysis using ANOVA.
Select Differential Expression Analysis from the Analyze Known Genes section of the RNA-Seq workflow
The Differential Expression Analysis dialog offers the choice of analyzing at Gene-,Transcript-, or Exon-level.
Select Gene-level
Specify the 1/gene_rpkm (RNA-Seq_results.gene.rpkm) spreadsheet from the Spreadsheet drop-down menu (Figure 1)
Figure 1. Choosing the type of differential expression analysis
Select OK to open the ANOVA dialog
Available factors are listed in the Experimental Factor(s) panel on the left-hand side of the dialog.
Select Tissue, then select Add Factor > to move Tissue to the ANOVA Factor(s) panel on the right-hand side of the dialog (Figure 2)
Figure 2. The ANOVA dialog
If the ANOVA were now performed (without contrasts), a p-value for differential expression would be calculated, but it would only indicate if there are differences within the factor Tissue; it would not inform you which groups are different or give any information on the magnitude of the difference between groups (fold-change or ratio). To get this more specific information, you need to define linear contrasts.
Select Contrasts... to open the Configure dialog
For Select Factor/Interaction, Tissue will be the only factor available as it was the only factor included in the ANOVA model in the previous step; if multiple factors were included, they could be selected in the Select Factor/Interaction: drop-down menu. The levels in this factor are listed on the Candidate Level(s) panel on the left side of the dialog
For this data set, verify that No is selected for Data is already log transformed?
Left click to select muscle from the Candidate Level(s) panel and move it to the Group 1 panel (renamed muscle) by selecting Add Contrast Level > in the top half of the dialog. Label 1 will be changed to the subgroup name automatically, but you can also manually specify the label name
Select not muscle from the Candidate Level(s) panel and move it to the Group 2 panel (renamed not muscle)
The Add Contrast button can now be selected (Figure 3)
Select OK to return to the ANOVA dialog
Figure 3. Defining linear contrasts
Select OK to perform the ANOVA as configured (Figure 4)
Figure 4. Fully configured ANOVA
Once the ANOVA has been performed on each gene in the data set, an ANOVA child spreadsheet ANOVA-1way (ANOVAResults) will appear under the gene_rpkm spreadsheet (Figure 5). The format of the ANOVA spreadsheet is similar for all workflows. Mouse over each column title for a description of the column contents.
Figure 5. Viewing ANOVA results
In this tutorial, the overall p-value for the factor (column 4) is the same as the p-value for the linear contrast (column 5) as there are only two levels within Tissue. If we had more than two groups, the overall p-value and the linear contrast p-values would most likely differ. You can also see the ? symbol in the ratio/fold-change columns (6 and 7) for several genes that also have a low p-value because there are zero reads in one of the groups, thus making it impossible to calculate ratios and fold-changes between groups.
For using ANOVA with more complicated experimental designs, including multiple factors and linear contrasts, please refer to Identifying differentially expressed genes using ANOVA in the Gene Expression Analysis tutorial.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The basic method of creating a gene list from ANOVA results based on fold-change and p-value cut-offs is detailed in Creating gene lists from ANOVA results. Advanced options enable the creation of lists based on more complex criteria. For example, we can use the Create Gene List function to identify transcripts that are both significantly differentially expressed AND alternatively-spliced among the four tissue samples.
Select Create Gene List from the Analyze Known Genes panel of the RNA-Seq workflow to invoke the List Manager dialog
Select the Advanced tab (Figure 1)
Figure 1. Creating a gene list using advanced options
Select Specify New Criteria to invoke the Configure Criteria dialog (Figure 2)
Figure 2. Configuring criteria for transcripts with a p-value < 0.05
In the Configure Criteria dialog box (Figure 2), provide a name for the list (Diff Exp)
Select 1/transcripts (RNA-Seq_results.transcripts) from the_Spreadsheet_ drop-down menu
Select 8. p-value(DiffExp) from the Column drop-down menu
Set Include p-values to significant with FDR with a value of 0.05
A list of 38,285 transcripts that pass this criteria will be generated according to the # pass score on the right-hand side of the dialog. If the settings are changed, this number will automatically update.
Select OK
Repeat the same steps to create a list of transcripts that are likely alternatively spliced, named Alt Splice, using the same p-value cutoff and Column set to 10. p-value (AltSplice) (Figure 3)
Figure 3. Configuring criteria for a list of alternatively spliced genes
Select OK to generate Alt Splice
Select both lists in the right-hand panel under the Criteria panel while holding the Ctrl key on your keyboard
Select Intersection from the left-hand panel of the List Manager dialog (Figure 4)
Figure 4. Creating a gene list at the intersection of two criteria
Enter a name for the criteria (Diff Exp and Alt Splice)
Select OK to close the naming dialog and OK again to close the list creation hint dialog
Select Save List from the Manage criteria section of the List Manager dialog (Figure 5)
Figure 5. Saving a created list criteria
Select Diff Exp and Alt Splice in the List Creator dialog (Figure 6)
Figure 6. Selecting list to save in List Creator dialog
Select OK to save the list
Select Close to exit the List Manager dialog and view the Diff_Exp_and_Alt_Splice spreadsheet (Figure 7)
Figure 7. A list of the differentially expressed and alternatively spliced genes is now available for downstream analysis
This list of differentially expressed and alternatively spliced transcripts will be used later in the tutorial.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
We will be using the RNA-Seq workflow to analyze RNA-Seq data throughout this tutorial. The commands included in the RNA-Seq workflow are also available form the command toolbar, but may be labeled differently.
Select the RNA-Seq workflow by selecting it from the Workflow drop-down menu in the upper right-hand corner of the Partek Genomics Suite window (Figure 1)
Figure 1. Selecting the RNA-seq workflow
The Partek Genomics Suite software can import next generation sequencing data that has been aligned to a reference genome. Two standard types of alignment formats can be imported: .BAM and .SAM. It is also possible to concert ELAND .txt files to .BAM files with the converter found in the Tools menu in the main command bar. The data used in this tutorial was aligned using the Partek® Flow® software and saved as .BAM files.
To import the .BAM files, select Import and Manage Samples from the Import section of the RNA-Seq workflow. The Sequence Import dialog box will open (Figure 2)
Figure 2. Importing .BAM files
Select BAM Files (*.bam) from the Files of type drop-down menu if not set by default
Use the file browser panel on the left-hand side of the Sequence Dialog or select Browse... to navigate to the folder where you stored the tutorial .BAM files
Files with checked boxes next to the file name will be imported. For this tutorial, select brain_fa, heart_fa, liver_fa, and muscle.fa
Select OK to confirm the file selection and open the next dialog (Figure 3)
Figure 3. Viewing the Sequence Import wizard; specify Output file (and directory using Browse), Species, and Genome
Configure the dialog as shown (Figure 3)
Output file provides a name for the top-level spreadsheet. Browse can be used to change the output directory.
Select Homo sapiens from the Species drop-down menu
This will allow us to select a human genome reference assembly alignment.
Select hg19 for Genome/Transcriptome reference used to align the reads
This is the reference genome our tutorial data was aligned to using Partek Flow.
Select OK to open the BAM Sample Manager dialog (Figure 4)
Figure 4. Bam Sample Manager dialog
The Bam Sample Manager dialog allows additional samples to be added or removed after the initial sample import. To remove a sample, select a sample from the list and then select Remove selected samples. This dialog also allows us to modify samples.
Select Manage samples to open the Assign files to samples dialog
Sample ID is by default set to the file name, which may be too long or uninformative, so the Assign files to samples dialog can be used to give informative names to samples.
Change the samples names to Brain, Heart, Liver, and Muscle as shown (Figure 5)
The Assign files to samples dialog also allows multiple .BAM files to be merged into one sample. This is useful if reads from one sample are split into multiple .BAM files.
Figure 5. Changing sample names using the Assign files to samples dialog
Select OK to close the Assign files to samples dialog
Select Close to exit the Bam Sample Manager dialog and view the imported data (Figure 6)
Figure 6. Viewing the imported data in a spreadsheet
Additional files can be added to this spreadsheet using the Bam Sample Manager dialog. The Bam Sample Manager dialog can also be used to add imported samples to a separate spreadsheet by selecting a new option in the dialog, Add new experiment.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Once imported, it is possible to visualize the mapped reads along with gene annotation information and cytobands.
Select the parent spreadsheet 1 (RNA-seq)
Select Chromosome View in the Visualization section of the RNA-Seq workflow panel
Unless you have previously downloaded an annotation file, you will be prompted to select an annotation source.
Select RefSeq Transcripts - 2017-05-02
Partek Genomics Suite will download the relevant file and save it to your default library location. The Chromosome View tab will open with chromosome 1 displayed (Figure 1)
Figure 1. Visualizing reads on a chromosome level in Chromosome View
In Chromosome View you can choose other chromosomes from the position field drop-down menu (Figure 2) to change which chromosome is displayed. You may also type a search term (e.g. gene symbol or transcript ID) directly into the position field.
Figure 2. Choosing a chromosome to view in Chromosome View
The Tracks panel contains the following tracks:
RefSeq Transcripts (+)
The RefSeq Transcripts (+) track shows all genes encoded on the forward strand of the currently selected chromosome. This experiment uses RefSeq Transcripts, which defines genomic sequences of well-characterized genes, as the reference annotation track. Mouse-over a particular region in this track, and all genes within this region are shown in the information bar. Zoom in on this track to see individual genes, including alternative isoforms.
RefSeq Transcripts (-)
The RefSeq Transcripts (-) track shows all the genes encoded on the reverse strand the currently selected chromosome_._
Legend Base Colors
The Legend Base Colors track shows the color for each nucleotide. Colored nucleotide bases become visible in the Bam Profile tracks at higher levels of magnification. By default, the colors are set to red for adenine (A), blue for cytosine (C), yellow for guanine (G), green for thymine (T), and black for base not called (N). The color of the bases can be configured by selecting the Legend Base Colors track and selecting Configure colors in the track configuration panel beneath the Tracks panel on the left-hand side of the Genome Viewer.
Bam Pofile (Heart, Brain, Muscle, Liver)
The Bam Profile tracks show all the reads that mapped to the currently selected chromosome from the four tissue samples. The y-axis numbers on the left side of the tracks indicate the raw read counts. The aligned reads are shown in the Genome Viewer in each track with a different color for each Bam Profile track.
Genome Sequence, Cytoband, and Genomic Label
The Genome Sequence, Cytoband and Genomic Label tracks are shown at the bottom of the panel. These labels are helpful for navigating the chromosome.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
We are now ready to measure gene expression in our dataset. To do this, we will use the mRNA quantification task in the Analyze Known Genes section of the RNA-Seq workflow. mRNA quantification creates spreadsheets showing expression at exon, transcript, and gene levels and reports raw and normalized reads for each sample.
Please note that the normalization method used by Partek Genomics Suite is Reads Per Kilobase per Million mapped reads (RPKM) (Mortazavi et al. 2008). In brief, this normalization method counts total reads in a sample, divides by one million to create a per million scaling factor for each sample; then divides the read counts for the feature (exon, transcript, or gene) by the per million scaling factor to normalize for sequencing depth and give a reads per million value; and finally divides reads per million values by the length of the feature (exon, transcript, or gene) in kilobases to normalize for feature size.
Select 1 (RNA-Seq) from the spreadsheet tree
Select mRNA quantification in the Analyze Known Genes section of the RNA-seq workflow
The RNA-Seq Quantification dialog will appear (Figure 1).
Select RefSeq Transcripts 2017-05-02 from the mRNA section of the Specify a database of genomic features to quantify panel of the dialog
Your choices in the Configure the test panel of the dialog depend on the design and aims of your experiment. A detailed description of each option can be viewed by selecting the () icon next to it.
For Strand-specificity: select No
Your choice here depends on the method used for sample preparation. A directional mRNA-seq sample preparation protocol only synthesizes the first strand of cDNA whereas other methods reverse transcribe the mRNA into double-stranded cDNA. If double-stranded cDNA has been synthesized, the sequencer reads sequences from both the forward and reverse strands but does not discriminate between them, eliminating strand information. When strand information is preserved, it is possible for paired-end sequences to come from a combination of the forward and reverse strands. If in doubt, select Auto-detect from the drop-down list. The data for this tutorial did not preserve strand information so we selected No.
For In the gene-level result report intronic reads as compatible with the gene?, select No
Selecting Yes would include intronic reads in the gene-level results, which might be useful for discovering unannotated transcripts for known genes, and also includes introns in the RPKM calculation for the gene-level results.
For Require strict paired-end compatibility select No
Selecting Yes would require that two alignments form the same read must map to the same transcript to be considered compatible. However, the data set used in this tutorial consists of single-end reads so this option is unnecessary.
For report results with no reads from any sample? select No
Selecting Yes would include all the genes/transcripts/exons in the transcriptome, even if there are no reads for that feature from any sample.
Make sure Report unexplained regions with more than ___ reads is selected and specify 5 as the number of reads
This option will create a spreadsheet that includes all regions with a specified number of reads that map to the genome, but not to any feature included in the selected database of genomic features.
Select Report exon-level results
If selected, spreadsheets will be created describing expression at the exon level.
Your RNA-Seq Quantification dialog should now be configured as shown (Figure 1). Descriptions of the spreadsheets that can be created by mRNA Quantification can be viewed by selecting Describe results to bring up the Quantification Result Help dialog.
Figure 1. Configuring the RNA-Seq Quantification dialog
Select OK to perform the RNA-Seq quantification
Reads will now be assigned to individual transcripts of a gene based on the Expectation/Maximization (E/M) algorithm (Xing, et al. 2006). In Partek Genomics Suite software, the E/M algorithm is modified to accept paired-end reads, junction aligned reads, and multiple aligned reads if these are present in your data. For a detailed description of the E/M algorithm, refer to the RNA-Seq white paper (Help > On-line Tutorials > White Papers). Several spreadsheets containing the analyzed results will be generated. Progress bars in the lower left-hand corner RNA-Seq Quantification window and the main window will update as the data is analyzed.
If you have not disabled it, the the Quantification Result Help dialog will appear. Select Close
The Analysis tab now shows the spreadsheets created by mRNA Quantification in the spreadsheet tree as a child spreadsheet of 1 (RNA-seq) (Figure 2).
Figure 2. Viewing the results of mRNA Quantification
The __reads and _rpkm_** spreadsheets**
Data on features - genes, transcripts, and exons - are presented before and after normalization as _reads and _rpkm spreadsheets. In this tutorial, we have created exon_reads, exon_rpkm, gene_reads, gene_rpkm, transcript_reads, and transcript_rpkm spreadsheets.In these spreadsheets, samples are listed one per row and the normalized counts of the reads mapped to features are in columns (Figure 2).
The _transcripts_** spreadsheet**
The transcripts spreadsheet lists a transcript in each row.
It is possible to derive basic information from the RNA-Seq_result.transcripts spreadsheet about differential and alternative splicing between your samples even if you don’t have replicates using a simple chi-squared or log-likelihood tests because each sample is represented only once and we can assume a null hypothesis that the transcripts are evenly distributed across all samples. However, the power of Partek Genomics Suite software resides in the implementation of a mixed-model ANOVA that can handle unbalanced and incomplete datasets, nested designs, numerical and categorical variables, any number of factors, and flexible linear contrasts when you do have biological replicates.
The _unexplained_regions_ spreadsheet
The contents of this spreadsheet are explained in more detail in a later section of the tutorial - Analyzing the unexplained regions spreadsheet.
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature, 2008; 5: 621-8.
Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C: An expectation-maximization algorithm for probalisitic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 2006, 34: 3150-3160.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
During a previous section of this tutorial, a spreadsheet named unexplained_regions was generated. This spreadsheet contains locations where reads map to the genome but are not annotated by the transcript database, in this case, RefSeqGene. The unexplained_regions spreadsheet is potentially very interesting as it may contain novel findings.
Right click column 6. Average Coverage and select Sort Descending from the menu
Select Find Overlapping Genes from the Tools option in the command toolbar (Figure 1)
Figure 1. Selecting Find Overlapping Genes from Tools in the command toolbar
Select Add a new column with the gene nearest to the region in the Find Overlapping Genes dialog (Figure 2)
Select OK
Figure 2. Find Overlapping Genes
Select RefSeq****Transcripts – 2017-05-02 from the Output Overlapping Features dialog (Figure 3)
Please note that it is recommended that you annotate with the same database used when you performed mRNA quantification.
Select OK
Figure 3. Select the database to search for overlapping features
The closest overlapping feature and the distance to it is now included as columns 7. Overlapping Features and _8. Nearest Features i_n the unexplained_regions spreadsheet.
Right-clicking on a row header and selecting Browse to Location will show the reads mapped to the chromosome. For this tutorial, a couple of genes are selected to show regions that are located after a known gene or in the intron of a gene.
Right-click row 39 and select Browse to location from the pop-up menu
Select the Chromosome View tab to view a region within an intron of UNC45B. This may be a novel exon (Figure 4)
Figure 4. A region within an intron of UNC45B that might be an novel exon
Right-click row 12576 and select Browse to location to go to a region that starts 1 bp after CD82.
This peak may represent an extended exon (Figure 5).
Figure 5. A region that starts 1 bp after CD82 that might represent an extended exon
While RefSeq was used to identify overlapping features, the choice of which database to use will depend on the biological context of your experiment. For example, you may wish to utilize promoter or miRNA databases if you are interested in regulation of expression.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Chromosome View in the Partek Genomics Suite software enables visualization of differential expression and alternative splicing results in RNA-Seq data.
Select New Track
Select Add a track from spreadsheet and select 1/transcripts (RNA-Seq_results.transcripts) from the drop-down menu
Select Next > (Figure 1)
Figure 1. Adding a new track to Chromosome View
The new track will be added to Chromosome View (Figure 2).
Figure 2. Viewing isoform proportion track in Chromosome View
At this point, you may find it useful to alter track properties. Each track can be individually configured. For example, isoform information will be easier to visualize if we remove a few tracks.
Select Cytoband (hg19) in the Tracks panel
Select Remove Track to remove it form the viewer
Repeat for Genomic Label, RefSeq Transcripts - 2017-05-02 (hg19) (-), Legend: Base Colors, and Genome Sequence
Next, we are going to view a single gene, SLC25A3, with differentially expressed isoforms.
Type SLC25A3 in the Plot Position bar at the top of the window and hit Enter. The browser will browse to the gene
To further improve our visualization of SLC25A3 isoforms, we can modify the remaining tracks.
Select RefSeq Transcripts - 2014-01-03 (hg19) (+) from the Tracks panel
Change Track height to 60 using the slider
Select Apply to change track height
Repeat steps to set each Bam Profile track to a height of 40 to complete our changes
Move the Isoform proportion track to below the RefSeq Transcripts track by selecting and dragging it up the list (Figure 3)
Figure 3. Changing tracks in Chromosome View to facilitate visual analysis of isoform porportions
The Muscle, Br_ain_, Heart, Liver, and genomic label tracks were described in a previous section. Here, the focus is on the Isoform proportion track, which visualizes differential expression and alternative splicing. The reads that are mapped to a certain sample and the proportion of the transcript expressed in that sample are colored to match the Bam Profile track of that sample. In this screenshot, Brain is yellow, Heart is green, Liver is red, and Muscle is orange
SLC25A3 was reported by Wang, et al., (Nature, 2008) to have “mutually exclusive exons (MXEs)”. The reads mapped to the 3 transcripts of this gene in each of the tissue samples are shown in the Genome Viewer in the isoform proportion track. The relative abundances of the individual transcripts of this gene are shown by the height of the color coded bars on each transcript in the isoform proportion track. Note transcript NM_213611 has low expression while transcripts NM_005888 and NM_002635 have higher expression. Also note that NM_005888 is expressed primarily in the heart and muscle, indicated by the primarily green and orange bars, while NM_002635 is expressed primarily in the brain and liver, indicated by the primarily yellow and red bars.
For additional tips on using the Chromosome View, refer to Visualizing mapped reads with Chromosome View.
Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., & Burge, C.B. Alternative isoform regulation in human tissue transcriptomes. Nature, 2008; 456: 470-6.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
With the GO Enrichment feature in Partek Genomics Suite, you can take a list of significantly expressed genes/transcripts and find GO terms that are significantly enriched within the list. For a detailed introduction to GO Enrichment, refer to the GO Enrichment User Guide (Help > On-line Tutorials > User Guides).
Select the Diff_Exp_and_Alt_Splice spreadsheet from the spreadsheet tree
Select Gene Set Analysis in the Biological Interpretation section of the RNA-Seq workflow (Figure 1)
Figure 1. Selecting Gene Set Analysis
Select GO Enrichment in the Gene Set Analysis dialog (Figure 2)
Select Next >
Figure 2. Selecting the method of analysis
Select the spreadsheet 1/Diff_Exp_and_Alt_Splice (Diff Exp and Alt Splice.txt) from the drop-down menu (Figure 3)
Select Next >\
Figure 3. Selecting the spreadsheet that contains the genes you want to test
Select Use Fisher's Exact test
Select Invoke gene ontology browser on the result
Set Restrict analysis to functional groups with more than _ genes to 2 (Figure 4)
Select Next >
Figure 4. GO Enrichment options
Select Default mapping file (Figure 5)
Select Next >
Figure 5. Selecting the mapping file
A GO-Enrichment spreadsheet, as well as a browser (Figure 6), will be generated with the enrichment score shown for each GO term. Browse through the results to find a functional group of interest by examining the enrichment scores. The higher the enrichment score, the more over represented this functional group is in the input gene list. Alternatively, you may use the Interactive filter on the GO-Enrichment spreadsheet to identify functional groups that have low p-values and perhaps a higher percentage of genes in the group that are present.
Figure 6. Viewing the Gene Ontology Browser
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Now that the data has been imported, we need to make a few changes to the data annotation before analysis.
Notice that the Sample ID names in column 1 are gray (Figure 1). This indicates that Sample ID is a text factor. Text factors cannot be used as a variable in downstream analysis so we need to change Sample ID to a categorical factor.
Figure 1. Viewing the imported data in a spreadsheet
Right-click on the column header to invoke the pop-up menu
Select Properties (Figure 2)
Figure 2. Changing column properties
Configure the Properties of Column 1 in Spreadsheet 1 dialog as shown (Figure 3) with Type set to categorical and Attribute set to factor
Figure 3. Changing column 1 properties
Select OK
The samples names in column 1 are now black, indicating that they have been changed to a categorical variable. Next, we will add attributes for grouping the data.
From the RNA-seq workflow panel, select Add Sample Attributes to bring up the Add Sample Attributes dialog (Figure 4)
Figure 4. Add Sample Attributes dialog
Select Add a Categorical Attribute
Select OK to bring up the Create categorical attribute dialog
Creating a categorical sample attribute allows us to group samples. This is useful for designating samples as replicates, as members of an experimental group, or as sharing a phenotype of interest. In this tutorial, we have four different samples from different tissues and different donors, but to illustrate the available statistical analysis options, we need to divide the samples into two groups: muscle (Heart and Muscle) and not muscle (Brain and Liver).
Set Attribute name: as Tissue
Rename Group 1 to muscle and Group 2 to not muscle
Select and drag the samples from the Unassigned panel to the correct group panel (Figure 5)
Figure 5. Creating a categorical attribute
Select OK
Select No from the Add another attribute? dialog
Select Yes from the Save spreadsheet 1 dialog
The attribute will now appear as a new column in the RNA-seq spreadsheet with the heading Tissue and the groups muscle and not muscle.
The next available step in the Import panel of the RNA-seq workflow is Choose Sample ID Column_._ Verifying the correct column is designated the Sample ID becomes particularly important when data from multiple experiments is being combined.
Select Choose Sample ID Column from the Import panel of the RNA-Seq workflow
Select OK (Figure 6)
Figure 6. Choosing the correct column as Sample ID
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The _reads and _rpkm spreadsheets can be used for data analysis. Sample grouping can be visualized using PCA. Select View > Scatter Plot from the toolbar or press on the quick action bar to create a PCA plot from the selected spreadsheet. See Exploring gene expression data for an example of using PCA plots for data analysis or consult Chapter 7 of the Partek User's Manual for a detailed introduction to PCA. With replicates in a sample group, you would also be able to use the _rpkm spreadsheet to perform differential expression analysis using ANOVA.
Select () several times to zoom out slightly