Partek Flow can analyze shotgun metagenomic sequencing data to characterize microbial communities from NGS sequencing data.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Kraken is a taxanomic sequence classifier that assigns taxonomic labels to short DNA sequences, typically from microbiome or metagenomic studies (1). Kraken classifies reads to a best-match location in a taxonomic tree (the lowest common ancestor), so not all sequences will be classified to a particular level such as species. Kraken matches k-mers (nucleotide sequences of k bases in length) within a read to a database of _k-_mer sequences from known genomes with established taxonomic relationships to perform its classifications. Partek Flow currently wraps Kraken version 2.0.8-beta (2), but Kraken 0.10.5-beta is also available (to roll back to the older version, see Task Management).
Kraken takes FASTQ files as input. Reads can be single- or paired-end.
Click a data node with FASTQ files
Click the Metagenomics section of the toolbox
Click Kraken
Choose a Kraken database
Configure parameters
Click Finish to run (Figure 1)
Kraken generates a Taxonomic data node. This data can be used as input for the Alpha & Beta diversity and Choose taxonomic level tasks. If you want to obtain the Kraken output files, select the new data node and choose Download data from the toolbox on the right.
Partek distributes the Kraken databases Bacteria, Human, Plasmids, Viruses, and MiniKraken. MiniKraken includes sequences from bacterial, archaeal, and viral genomes in RefSeq; however, it contains only 2.7% of the _k-_mers from the original database. Running Kraken using the MiniKraken database is significantly faster and less resource-intensive than using a full database, but will not give as complete a result.
If enabled, an Unaligned reads data node including reads that could not be classified by Kraken is produced. Default is Disabled.
If enabled, uses the first hit or hits (--quick). Default is Disabled.
The Kraken task report presents a table and graph summarizing the results (Figure 2)
The table lists each sample and gives the following values:
The number of reads that were classified by Kraken for each sample.
The number of reads that were not classified by Kraken for each sample.
Lists the number of different taxa that were detected within each taxonomic level among the classified reads for each sample. For example, in Figure 2, there were 139 different species detected in Sample 1.
The stacked bar chart shows the abundance or relative abundance of the different phyla. The legend lists the color of each phyla. Mouse over a bar to view the breakdown of families within the phyla (Figure 3). Use the radio button to switch between Absolute abundance and Relative abundance.
Clicking a section of the pie chart zooms in to that section by setting the selected level as the root (Figure 5).
The mini-map on the upper left is shaded in green to indicate which section of the original pie chart is currently shown (Figure 5).
1. Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.
2. Wood DE, Lu J & Langmead B: Improved metagenomic analysis with Kraken 2. Genome Biol 2019, 20:257
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Alpha & beta diversity are measures of microbial population diversity.
Alpha diversity is a measure of the species diversity within a sample. Partek Flow can calculate two of the most commonly used alpha diversity metrics: Shannon index (1) & Simpson index (2). The Shannon index takes into account the number of different species (richness) and how evenly the counts are distributed among species. The higher the Shannon index value, the higher the species diversity. The Simpson index gives more weight to common or dominant species and scales from 0 to 1. The closer to 1, the higher the species diversity.
Beta diversity is a measure of between-sample diversity. Pairwise comparisons are made between each pair of samples and the distance matrix is fed through Principal Coordinates Analysis (PCoA) for visualization. This allows you to see how samples cluster together based on how similar/different their microbial communities are. Partek Flow calculates beta diversity using two dissimilarity distance metrics: Bray-Curtis coefficient (3,4) and Jaccard binary index (5,6). Bray-Curtis is a quantitative method, meaning it takes the abundance (i.e. the read counts) of each species into account when calculating dissimilarity. Jaccard binary index is a qualitative method, meaning it is based on the presence/absence of species and looks at the species overlap between two samples.
Quantitative approaches are generally more powerful in beta diversity assessment because the abundance data is more information-rich than presence/absence data. It can still be useful to compare quantitative and qualitative beta diversity results. For example, Kuczynski et al. (7) showed that qualitative methods can perform well on distinctly clustered samples but badly on subtly clustered samples, whereas quantitative methods can detect more subtle clusters. Thus, if a qualitative method (Jaccard) does not identify clusters and a quantitative method (Bray-Curtis) does, you can infer that the observed clusters are more subtle.
The task can be performed on a Taxonomic data node, which is the output from a Kraken task. Alpha & beta diversity estimates are performed on species-level read counts.
Click a Taxonomic data node
Choose Alpha & beta diversity from the Metagenomic section of the toolbox
(optional) If you wish to compare the alpha diversity values between groups of samples, choose the factors to set up the ANOVA. This can be helpful if you want to see if there is a significant difference in species diversity between predefined groups. This is optional, so you can skip this if you want.
Click Finish
If there is only one sample present, beta diversity will not be calculated and there will be no option to calculate the alpha diversity ANOVA.
The task report is stored in a rectangular task node (Figure 1).
The task report has two tabs: Alpha diversity report and Beta diversity report.
The table at the top summarizes the Shannon and Simpson index for each sample. The table can be downloaded as a tab-delimited text file by clicking Download in the right corner of the table.
If the ANOVA was set up, there will be a table showing the results of the statistical analysis (Figure 2). A separate ANOVA test is performed for each alpha diversity metric to see if there is any significant difference between groups specified in the contrasts. If no ANOVA was set up, there will not be a table. See the GSA documentation for an explanation of each column.
At the bottom, there is a bar chart summarizing the Shannon and Simpson metrics for each sample (Figure 4).
To view the beta diversity results, click the Beta diversity report tab at the top.
The beta diversity results are presented in a Data viewer session, with two PCoA plots on the canvas. One plot shows the clustering based on the Bray-Curtis coefficient distance metric. The other shows the clustering on the Jaccard index distance metric. In both plots, each point is a different sample and they cluster together based on how similar their overall metagenomic profiles are. Points very close together are similar, points far apart are different. The Configuration panel on the left can be used to customize the plots.
To obtain the table of pairwise comparisons (dissimilarity matrix) for each distance metric, click the appropriate hyperlink below the Data viewer. The table will be downloaded as a tab-delimited text file. For both distance metrics, the values range from 0 to 1. The higher the value, the more different that pair of samples are. The lower the value, the more similar they are.
Shannon CE: A mathematical theory of communication. Bell System Technical Journal 1948, 27.
Simpson EH: Measurement of diversity. Nature 1949, 163: 688.
Bray JR, Curtis JT: An ordination of upland forest communities of southern Wisconsin. Ecol Monogr 1957, 27.
Beals E: Bray-Curtis ordination: an effective strategy for analysis of multivariate ecological data. Adv Ecol Res 1984, 14.
Jaccard P: Lois de distribution florale. Bulletin de la Socíeté Vaudoise des Sciences Naturelles 1902, 38.
Jaccard P: The distribution of the flora in the alpine zone. New Phytologist 1912, 11:2
Kuczynski J, Liu Z, Lozupone C, McDonald D, Fierer N, Knight R: Microbial community resemblance methods differ in their ability to detect biologically relevant patterns. Nat Methods. 2010, 7:10
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Click the in the View column of the table opens an interactive Hierarchical pie chart that can be used to explore the taxonomies present in each sample (Figure 4). Mousing over a section of the pie chart gives the number of reads classified to this level, the number of reads in its children, the percentage of total reads represented by this group, and the percentage of the reads in the root that are represented by this group.
Click the green arrow to move up one taxonomic level (set the root to one level higher). Click reset to show the entire pie chart.
The Choose taxonomic level task generates a count matrix summarizing the number of reads that have been classified by Kraken for each taxon in each sample, at a given taxonomic level. The counts give a measure of the relative abundance of each taxon, which can be used for downstream analysis and visualization as if it were gene expression count data.
The task can be performed on a Taxonomic data node, which is the output from a Kraken task.
Click a Taxonomic data node
Choose Choose taxonomic level from the Metagenomic section of the toolbox
Check one or more taxonomic levels. The options are Superkingdom, Kingdom, Phylum, Class, Order, Family, Genus, or Species (Figure 1). A separate output data node will be generated for each one that is selected (Figure 2)
Click Finish
The choice of taxonomic level depends on which level you want to perform downstream analysis on and your research question. For example, if you want to know which families of bacteria are the most abundant in your sample, choose the family level. If you want to see which species are differentially abundant in different groups of samples, choose the species level.
To export the count matrix for a taxonomic level, select the output data node and choose Download data from the toolbox. You can choose to put the features on the columns or rows (Figure 3). The 'features' in this context are the taxa. For example, if a Phylum data node is downloaded, the features will be different phyla. The download will be a tab-delimited text file with read counts for each sample (Figure 4).
The taxon-level count data node(s) behave like any other count matrix in Partek Flow. This means you can perform most of the tasks you would normally perform on gene expression data. For example, you can normalize the counts, perform principal components analysis (PCA), and use ANOVA to detect differentially abundant species in different groups of samples (Figure 5). Additional visualizations can also be generated including heatmaps, volcano plots, dot plots, and more.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.