1 of 6

Post-alignment tools

Post-alignment tools involve tasks that can be performed on aligned data. These are typically used in preparing aligned data for other downstream analyses, such as DNA-Seq or RNA-Seq analysis.

To invoke Post-alignment tools, click on an Aligned reads data node (Figure 1). There are three functions available in Post-alignment tools:

Filter alignments
Convert alignments to unaligned reads
Combine alignments
Deduplicate UMIs
Downscale alignments

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Filter alignments

Introduction
Removing duplicates
Remove alignments with mismatches

Introduction

The Filter alignments task can be used to filter aligned reads data using specified parameters. To invoke the task, click on an Aligned reads data node and select Filter alignments. By default, this task removes low-quality reads, singletons and unaligned read information stored within the BAM/SAM file (Figure 1).

Removing duplicates

Users also have the option to remove duplicate reads in aligned data. For DNA-Seq analysis, this is typically performed to minimize redundant variant calling information. To remove duplicates, click on the Remove duplicates checkbox (Figure 2).

Select the number of reads you want to keep. Then specify when alignments are treated as duplicates. This can either be reads that map to the same start position or, additionally, have the same sequence. You can also select whether to keep the read with the highest mapping score or a randomly-selected duplicate.

Remove alignments with mismatches

To remove alignments with mismatches, select the Remove alignments with mismatches check box. Using the selector, specify the number the number of mismatched bases that need to be exceeded for the alignment to be excluded (Figure 3). Note that mismatches also include insertions and deletions.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Convert alignments to unaligned reads

Aligned reads can be converted to unaligned reads in Partek Flow. The task is available under Post-alignment tools in the task menu when any Aligned reads data node is selected, which can be a result of an aligner in Partek Flow or data already aligned before import.

Generating unaligned reads from aligned data gives you the flexibility to remap the reads using either a different aligner, a different set of alignment parameters, or a different genome reference. This is particularly useful in analyzing sequences from xenograft models where the same set of reads can be aligned two different species. It may also be useful if the original unaligned FASTQ files are not as easily accessible to the user as the aligned BAM files.

To perform the task, select an Aligned reads data node and click Convert alignments to unaligned reads task in the task menu (Figure 1).

During the conversion, the BAM files are converted to FASTQ files and a new Unaligned reads data node will be generated (Figure 2) .

The filenames of the FASTQ files will be based on the sample names in the Data tab. The files generated are compressed with the extension *.fq.gz. For samples containing BAM files with paired end reads, two FASTQ files will be generated for each, and the files names will be appended with _1 and _2. An example in Figure 3 shows 18 .fq.gz output files that came from 9 BAM files.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Combine alignments

The Combine alignments task can potentially maximize the number of reads that map to a region. When unaligned reads resulting from one aligner are then aligned using a different one, the two resulting alignments can combined together for downstream analysis within Partek Flow. Note that this can only be performed if they were aligned to the same reference genome.

To invoke this task, click an Aligned reads data node and select Combine alignments (Figure 1).

A list of compatible alignments will appear. The color of the text signifies the layer the alignment corresponds to (Figure 2). Select the alignment you would like to combine and click Finish.

The resulting Aligned reads data node can now be used for downstream analysis (Figure 3).

Note that this task combines the files in the data node within Partek Flow but does not merge the BAM files. Downloading an aligned reads data node from a Combine alignments task will result to multiple BAM files per sample.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Deduplicate UMIs

Configuring Deduplicate UMIs
Deduplicate UMIs task report

The Deduplicate UMIs task identifies and removes reads mapped to the same chromosomal location with duplicate unique molecular identifiers (UMIs). The details of our UMI deduplication methods are outlined in the UMI Deduplication in Partek Flow white paper.

To invoke Deduplicate UMIs:

Click an Aligned reads data node
Click Post-alignment tools in the toolbox
Click Deduplicate UMIs

The task configuration dialog content depends on whether you imported FASTQ files or BAM files into Partek Flow.

Configuring Deduplicate UMIs

Imported FASTQ

UMIs and barcodes are detected and recorded by the Trim tags task in Partek Flow. You can choose whether to retain only one alignment per UMI or not (Figure 1). The default will depend on which prep kit was used in the Trim tags task.

If you select Retain only one alignment per UMI, you will be asked to choose an assembly and gene/feature annotation file. The annotation file is used to check whether a read overlaps an exonic region. Only reads that have 50% overlap with an exon will be retained.

If you do not select Retain only one alignment per UMI, UMI deduplication will proceed without filtering to exonic reads. Other differences between the two options are outlined in the UMI Deduplication in Partek Flow white paper.

Imported BAM

Imported BAMs generated by other tools can be imported into Partek Flow and deduplicated by the software. Additional options are available in the task configuration dialog to allow you to specify the location of the UMI and cell barcode information typically stored in the BAM header. Specify the BAM header tags in the text fields. For example, when processing a BAM file produced by CellRanger 3.0.1, the BAM identifier tag for the UMI sequence is UR and the BAM identifier for the barcode sequence is CR (Figure 2).

The option to Retain only one alignment per UMI is also available when starting from a BAM file.

Deduplicate UMIs task report

The Deduplicate UMIs task report includes a knee plot showing the number of deduplicated reads per barcode. This plot is used to filter the barcodes to include only barcodes corresponding to cells. For more information about using the knee plot to filter barcodes, please see the Cell Barcode QA/QC page. One difference between the Deduplication report and the Cell Barcode QA/QC report is that the Deduplication report gives the number of initial alignments and the number of deduplicated alignments for each sample (Figure 3). This indicates how many of your aligned reads were PCR duplicates and how many were unique molecules.

The initial number of cells is set by our automatic filter. You can set the filter manually by clicking on the plot or by typing a cutoff number in the Cells or Reads in cells text boxes. If there are multiple samples, each sample receives a plot and filters are set per sample.

The number of cells, reads in cells, median reads per cell, number of initial alignments, and number of deduplicated alignments are listed for each sample in the summary table (Figure 4).

Clicking Apply filter at either the knee plot or the summary table will run the Filter barcodes task and generate a Filtered reads data node.

To return to the knee plot, click Back to filter.

To reset the filters for all sample to the automatic cutoff, click Reset all filters.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Downscale alignments

The Downscale alignments task can be invoked on data node containing bam/sam files, e.g. Aligned reads data node. The only parameter to specify is the percentage of the alignments to keep in the results, the range of the parameter should be between 0 to 100 (Figure 1). All the samples in the input data node will use the same parameter. The output data node contains bam files with a subset of the alignments.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Deduplicate UMIs

Configuring Deduplicate UMIs
Deduplicate UMIs task report

To invoke Deduplicate UMIs:

Click an Aligned reads data node
Click Post-alignment tools in the toolbox
Click Deduplicate UMIs

The task configuration dialog content depends on whether you imported FASTQ files or BAM files into Partek Flow.

Configuring Deduplicate UMIs

Imported FASTQ

Imported BAM

The option to Retain only one alignment per UMI is also available when starting from a BAM file.

Deduplicate UMIs task report

The number of cells, reads in cells, median reads per cell, number of initial alignments, and number of deduplicated alignments are listed for each sample in the summary table (Figure 4).

Clicking Apply filter at either the knee plot or the summary table will run the Filter barcodes task and generate a Filtered reads data node.

To return to the knee plot, click Back to filter.

To reset the filters for all sample to the automatic cutoff, click Reset all filters.