1 of 4

Metagenomics

Partek Flow can analyze shotgun metagenomic sequencing data to characterize microbial communities from NGS sequencing data.

Kraken

What is Kraken?
Running Kraken
Kraken parameters

What is Kraken?

Kraken is a taxanomic sequence classifier that assigns taxonomic labels to short DNA sequences, typically from microbiome or metagenomic studies (1). Kraken classifies reads to a best-match location in a taxonomic tree (the lowest common ancestor), so not all sequences will be classified to a particular level such as species. Kraken matches k-mers (nucleotide sequences of k bases in length) within a read to a database of _k-_mer sequences from known genomes with established taxonomic relationships to perform its classifications. Partek Flow currently wraps Kraken version 2.0.8-beta (2), but Kraken 0.10.5-beta is also available (to roll back to the older version, see ).

Running Kraken

Kraken takes FASTQ files as input. Reads can be single- or paired-end.

Click a data node with FASTQ files
Click the Metagenomics section of the toolbox
Click Kraken

Kraken generates a Taxonomic data node. This data can be used as input for the and tasks. If you want to obtain the Kraken output files, select the new data node and choose Download data from the toolbox on the right.

Kraken parameters

Database name

Partek distributes the Kraken databases Bacteria, Human, Plasmids, Viruses, and MiniKraken. MiniKraken includes sequences from bacterial, archaeal, and viral genomes in RefSeq; however, it contains only 2.7% of the _k-_mers from the original database. Running Kraken using the MiniKraken database is significantly faster and less resource-intensive than using a full database, but will not give as complete a result.

Generate unclassified reads

If enabled, an Unaligned reads data node including reads that could not be classified by Kraken is produced. Default is Disabled.

Advanced options - Quick operation

If enabled, uses the first hit or hits (--quick). Default is Disabled.

Kraken task report

The Kraken task report presents a table and graph summarizing the results (Figure 2)

Table

The table lists each sample and gives the following values:

Classified reads

The number of reads that were classified by Kraken for each sample.

Unclassified reads

The number of reads that were not classified by Kraken for each sample.

Taxonomic levels columns (Superkingdom, Kingdom, Phyla, Classes, Orders, Families, Genera, Species, No rank)

Lists the number of different taxa that were detected within each taxonomic level among the classified reads for each sample. For example, in Figure 2, there were 139 different species detected in Sample 1.

Bar chart

The stacked bar chart shows the abundance or relative abundance of the different phyla. The legend lists the color of each phyla. Mouse over a bar to view the breakdown of families within the phyla (Figure 3). Use the radio button to switch between Absolute abundance and Relative abundance.

Hierarchical pie chart

Click the in the View column of the table opens an interactive Hierarchical pie chart that can be used to explore the taxonomies present in each sample (Figure 4). Mousing over a section of the pie chart gives the number of reads classified to this level, the number of reads in its children, the percentage of total reads represented by this group, and the percentage of the reads in the root that are represented by this group.

Clicking a section of the pie chart zooms in to that section by setting the selected level as the root (Figure 5).

Click the green arrow to move up one taxonomic level (set the root to one level higher). Click reset to show the entire pie chart.

The mini-map on the upper left is shaded in green to indicate which section of the original pie chart is currently shown (Figure 5).

References

1. Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.

2. Wood DE, Lu J & Langmead B: Improved metagenomic analysis with Kraken 2. Genome Biol 2019, 20:257

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Alpha & beta diversity

Choose taxonomic level

Introduction
Running the Choose Taxonomic Level Task
Download a count matrix

Introduction

The Choose taxonomic level task generates a count matrix summarizing the number of reads that have been classified by Kraken for each taxon in each sample, at a given taxonomic level. The counts give a measure of the relative abundance of each taxon, which can be used for downstream analysis and visualization as if it were gene expression count data.

Running the Choose Taxonomic Level Task

The task can be performed on a Taxonomic data node, which is the output from a Kraken task.

Click a Taxonomic data node
Choose Choose taxonomic level from the Metagenomic section of the toolbox
Check one or more taxonomic levels. The options are Superkingdom, Kingdom, Phylum, Class, Order, Family, Genus, or Species (Figure 1). A separate output data node will be generated for each one that is selected (Figure 2)

The choice of taxonomic level depends on which level you want to perform downstream analysis on and your research question. For example, if you want to know which families of bacteria are the most abundant in your sample, choose the family level. If you want to see which species are differentially abundant in different groups of samples, choose the species level.

Download a count matrix

To export the count matrix for a taxonomic level, select the output data node and choose Download data from the toolbox. You can choose to put the features on the columns or rows (Figure 3). The 'features' in this context are the taxa. For example, if a Phylum data node is downloaded, the features will be different phyla. The download will be a tab-delimited text file with read counts for each sample (Figure 4).

Downstream Analysis

The taxon-level count data node(s) behave like any other count matrix in Partek Flow. This means you can perform most of the tasks you would normally perform on gene expression data. For example, you can normalize the counts, perform principal components analysis (PCA), and use ANOVA to detect differentially abundant species in different groups of samples (Figure 5). Additional visualizations can also be generated including heatmaps, volcano plots, dot plots, and more.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Choose taxonomic level

Introduction
Running the Choose Taxonomic Level Task
Download a count matrix

Introduction

Running the Choose Taxonomic Level Task

The task can be performed on a Taxonomic data node, which is the output from a Kraken task.

Click a Taxonomic data node
Choose Choose taxonomic level from the Metagenomic section of the toolbox
Check one or more taxonomic levels. The options are Superkingdom, Kingdom, Phylum, Class, Order, Family, Genus, or Species (Figure 1). A separate output data node will be generated for each one that is selected (Figure 2)

Download a count matrix

Downstream Analysis

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Kraken

What is Kraken?
Running Kraken
Kraken parameters

What is Kraken?

Running Kraken

Kraken takes FASTQ files as input. Reads can be single- or paired-end.

Click a data node with FASTQ files
Click the Metagenomics section of the toolbox
Click Kraken