1 of 5

Pre-alignment tools

Partek Flow provides Pre-alignment tools that allow the user to process next-generation sequencing data before proceeding to alignment. These tools are not only useful for controlling the quality of data, but can also be used for subsampling prior to analyzing the full dataset. There are three functions available in Pre-alignment tools:

Trim bases
Trim adapters

User is expected to have preliminary understanding of:

File formats for next generation sequencing data
Phred-quality score

Showing Pre-alignment tools

In order to show the Pre-alignment tools, select an Unaligned reads or Trimmed reads data node. They will appear on the context-sensitive menu on the right of the screen (Figure 1).

Different Pre-alignment tools are available for different formats of unaligned reads. For example: if the reads are in FASTQ format, then all four tools are available. On the other hand, if the unaligned reads are in FASTA or SFF format, then the Filter reads option is not available.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Trim bases

The Trim bases task is used to trim bases from the 5'-end or 3'-end of the reads. The most obvious reason for Trim bases is to trim away poor quality bases from the read prior to alignment because these can potentially affect alignment rate.

The task allows user to trim reads in different ways (Figure 1), including:

Trim bases based on quality score
Trim bases from 3'-end
Trim bases from 5'-end
Trim bases from both ends

Trim bases from 5'-or 3'-end (Figures 2-3) allows a fixed number of bases to be trimmed away from the 5'- or 3'-end of the reads. These two functions are useful for when your read length is constant. This is not recommended if the read length is not constant, since good quality bases from shorter reads are likely trimmed away by these functions.

Trim bases from both ends (Figure 4) allows user to keep only bases from a fixed start and end position of the reads. This is particularly useful if poor quality bases are observed on both ends of the read. So instead of performing trim bases successively from the 5'- and 3'-end, the trim bases will only be performed once by trimming from both ends.

Trim bases based on quality score (Figure 5) is probably the most useful function to trim poor quality bases from the 5'- or 3'-ends of reads. This function allows dynamic trimming of bases depending on quality score. The trimming can be done from either 5'-end, 3'-end or both ends of the reads. The function evaluates each base from the end of the read and trims it away until the last base has a quality score greater than the specified threshold. For an extensive evaluation of read trimming effects on Illumina NGS data analysis, see Del Fabbro et. al. [1].

Advanced options

In some cases, the reads that result from base trimming can have very short read lengths and thus are not recommended for alignment. Thus, Partek® Flow® Flow provides the option to set a Min read length after base trimming. This discards reads that are shorter than the set length.

Also, reads could have a high percentage of N's or ambiguous bases. Thus, the Max N setting is available to discard reads with %Ns higher than the set threshold

The Quality encoding option refers to the Phred quality score encoded within the FASTQ input file. The list of available options are: Phred+33, Phred+64, Solexa+64 and Integers. Selecting Auto-detect will determine whether the quality encoding is Phred+33 or Phred+64. For Solexa data, you will need to select Solexa+64. For most of datasets, auto-detect option works very well with a few exception cases where the base quality score falls into the grey zone (ambiguous zone) of Phred+33 and Phred+64 score. However, if the quality-encoding scheme is known, we recommend to selecting the encoding format directly from the quality encoding list.

Figure 6 shows the options available for all the different selection of Trim bases function. Note the default Min read length is 25bp. For micro RNA sequencing data, this default Min read length needs to be set to a smaller value (we recommend 15) to account for mature microRNAs.

Trim Bases Task Details Page

The Task Details page for Trim bases can be accessed by selecting the task node Trim bases, and subsequently selecting Task Details from the Task results section. In the Task details page, several sections are available:

General task information: contains information such as the task name, owner, status, submitted time, start, end and duration of the task
Output Files: contains the description of each output file. If you roll-over your mouse cursor to the file name, you will get the exact location of the file on the server. If you click on the file name, you will have the option to view up to 999 lines of the raw data. You can also download the file from the server.
Input Files: contains the information of input files. This section lists down all the input files used in the Trim bases task.

Trim Bases Task Report Page

The Trim bases Task Report page can be accessed by selecting either the Trim bases task node or Trimmed reads data node and then selecting the Task Report from the Task results section of the context sensitive menu. There is a link at the bottom of the page to directly go to the Task Details page. The page displays the following components:

Summary table: gives the total number of reads in each sample, the total number of reads trimmed (i.e. with at least one base trimmed from the read), total number of reads removed (due to Min read length and Max N parameters), the average number of bases trimmed per read, the average read quality before trim bases and finally the average read quality after trim bases.
Stacked bar-chart: shows percentage of untrimmed reads, trimmed reads and removed reads are shown in a stacked bar-chart to compare all the samples.
Average base quality score per position of trimmed reads: shows the average base quality score at each position of the trimmed reads for all samples in the project.

Trim Bases Output Files

The Trim bases function produces trimmed unaligned reads which is named as Trimmed Reads data node. The Trimmed Reads node will have the "trimmed" word appended to the filename. The Trimmed Reads data can be downloaded by selecting the Trimmed Reads node and then select Download data from the context sensitive menu. However, if you have access to the Partek Flow server, you can go to the Task Details page and identify the location of the output files from the Output Files section as described on the Trim Bases Task Details section above. The Trimmed Reads data node will have the same format as the raw data.

References

Del Fabbro C, Scalabrin S, Moragante M, Giorgi FM. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS ONE. 2013; 8(12): e85024.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.none;">43rates

Trim adapters

The existence of adapter sequences at the 5'-end or 3'-end of the reads has shown to be one of the major problems during alignment, causing the reads to be unaligned. Thus, removing adapter sequence is of utmost importance if the sequenced read length is longer than the molecule of interest, such as microRNA. The fact that mature microRNAs are short in length makes it almost certain that the adapter sequence will be sequenced at the 3'-end of the miRNA.

In order to know whether the data has been adapter-trimmed for microRNA data, we can look at the pre-alignment QA/QC of the raw data, specifically the read length distribution. If the read length distribution peaks at approximately 22-23 bases, this usually means the data has been adapter-trimmed. However, if you have a fixed length distribution, then very likely the data is not adapter-trimmed and you will need to get the adapter sequence from your vendor or service provider and use the Trim adapter function to trim away the adapter sequence.

Partek Flow software wraps Cutadapt [1], a widely used tool for adapter trimming. It can be used to trim adapter sequences in nucleotide-space data as well as color-space data.

In order to use Trim adapters function, you will need to know the adapter sequences. To trigger the Trim adapters function, please select Unaligned Reads

Filter reads

Next generation sequencing (NGS) data is notably huge in file size. Dealing with NGS data is not only time consuming but also puts constraints on hard disk space. This is especially true if analysis parameters need to be optimized. The Filter reads task is a very useful tool to get a subset of the raw data upon which optimization can be performed. The optimized parameters can then be saved and applied to the whole dataset.

Filter reads is only available for unaligned reads of FASTQ format. Select the Unaligned Reads data node then select Filter reads from the Pre-alignment tools section on the menu.

There are two options to filter reads: Subsample reads and Filter by read length.

To Subsample reads, specify how many reads you want to keep for every nth reads. For example: if the user specifies to "Keep one read for every 10 reads" (Figure 1), this means that for every 10 reads, the program will keep only 1 read. This is equivalent to keeping 10% of the data.

To Filter by read length, set the read length limits by choosing the minimum and maximum read length(s) to keep.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Trim tags

What is Trim tags?
Running Trim tags
Building a custom prep kit

What is Trim tags?

The Trim tags task allows you to process unaligned read data with adaptors, barcodes, and UMIs using a Prep kit file that specifies the configuration of these elements in your NGS reads.

Running Trim tags

Click an Unaligned reads data node
Click the Pre-alignment QA/QC section of the toolbox
Click Trim tags

There are three parameters to configure - Prep kit, Keep untrimmed, and Map feature barcodes.

Selecting Keep untrimmed will generate a separate unaligned reads data node with any reads that do not match the structure specified by the prep kit. This option is off by default, to save on disk space. Selecting Map feature barcodes is only necessary for processing protein data from 10x Genomics' Feature Barcoding assay (v3+ chemistry). For single cell gene expression data, leave this option unchecked.

Partek distributes prep kits for processing several types of data:

10x Chromium Single Cell 3' v2
10x Chromium Single Cell 3' v3
10x Chromium Single Cell 5'
Drop-seq

If your data is from one of these sources, you can select the appropriate option in the Prep kit drop-down menu. If the data is from another source, you can build a custom prep kit file to process your data.

Choose a Prep kit from the drop-down menu
Click Finish to run Trim tags (Figure 1)

The output of Trim tags is a Trimmed reads data node. An additional Untrimmed reads data node will be generated if the Keep untrimmed option was selected.

The task report provides a table with the total reads, reads retained, % reads retained, reads removed, and % reads removed for each sample (Figure 2). You can click Download at the bottom of the table to save a text file copy to your computer.

Building a custom prep kit

Select Other / Custom from the Prep kit name drop-down menu
Give the new prep kit a name
Choose Build prep kit

You can select Import prep kit if you have a Prep kit .zip file downloaded from Partek Flow.

Click Create (Figure 3)

The Prep kit builder interface will load (Figure 4).

There are three sections:

Is paired end - select to switch from single end to paired end FASTQ files (Figure 5). If you choose paired end, the First mate will correspond to the _R1 FASTQ file and the Second mate will correspond to the _R2 FASTQ file.

Figure 5. Paired end prep kits have first and second mate segmentation sections

Segmentation - this is where you will describe the structure of your reads

Click to add a segment.

Segments include adaptors, barcodes, UMIs, and the insert (i.e., the target sequence of the assay)

Adaptors

For adaptors, you have the option of choosing a file with your adaptor sequences or entering the adaptor sequences manually.

To use a file, choose File for Sequences and then click Choose File (Figure 6). Use the file browser to choose a FASTA file from your local computer.

To enter the sequences manually, choose Manual for Sequences then type or paste the adaptor sequences into the text field and click to add the adaptor (Figure 7). You must click for the adaptor sequence to be included. You can remove any adaptor you have added by clicking .

You can specify the mismatch allowance using the Mismatches option.

After you have specified the file or manually entered the sequences, click Add to add the adaptor sequence(s).

UMIs

Unique Molecular Identifiers (UMIs) are randomly generated sequences that uniquely identify an original starting molecule after PCR amplification.

Including a UMI in your prep kit will allow you to access a downstream task that uses UMI information for removing PCR duplicates. For more information about the Deduplicate UMIs task, please see our . Note that while the UMI sequence will be trimmed, a record of the UMI sequence for each read is retained for use by this downstream task.

When adding a UMI segment to your prep kit, you can specify the length of your UMIs (Figure 8).

Barcode

Adding a barcode segment to a prep kit allows you to access downstream tasks that use barcode information, including and Quantify barcodes to annotation model (Partek E/M). While the barcode sequence will be trimmed, a record of the barcode sequence for each read is retained for use by downstream tasks.

Like adaptors, barcodes can be specified using a file or manually specified, but you can also choose to designate any segment of arbitrary length in the sequence as the barcode. This is useful if you do not have a specific set of known barcodes.

To set the barcode to an arbitrary segment of fixed length, choose Arbitrary and specify the barcode length (Figure 9).

Remember to click Add to add the new segment to your prep kit.

Insert

The insert is the sequence retained after trimming in the Trimmed reads data node. For example, in RNA-Seq, this would be the mRNA sequence. Every prep kit must include an insert segment. You can specify the minimum size of the insert section using the Length field (Figure 10). Reads shorter than the minimum length will be discarded.

Remember to click Add to add the new segment to your prep kit.

Ordering segments

Segments are placed from 5' to 3' in the read in the order they are added. You should add the 5' segment first and add additional elements in order of their position in the read. Segments will appear in the Segmentation sections as they are added. You can mouse over a segment to view its details (Figure 11).

Custom prep kit example

For example, the expected read structure (Figure 12) and a completed prep kit for a standard Drop-seq library prep are shown below (Figure 13).

Remove poly-A tail - choose this option to trim poly-A tails from the ends of the read with your insert sequence

Click Next to complete your prep kit

Managing prep kits

You can manage saved prep kits by going to Home > Settings > Library file management and opening the Prep kit files tab (Figure 14).

You can add new prep kits from this page by clicking .

You can preview a prep kit by clicking , delete a prep kit by clicking , and download a prep kit to your computer by clicking .

Prep kits download as a .zip file. This Prep kit .zip file can be imported into Partek Flow by selecting Import from a file when adding a new prep kit. Select the .zip file when importing, do not unzip the file.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Trim bases

The task allows user to trim reads in different ways (Figure 1), including:

Trim bases based on quality score
Trim bases from 3'-end
Trim bases from 5'-end
Trim bases from both ends

Advanced options

Also, reads could have a high percentage of N's or ambiguous bases. Thus, the Max N setting is available to discard reads with %Ns higher than the set threshold

Trim Bases Task Details Page

General task information: contains information such as the task name, owner, status, submitted time, start, end and duration of the task
Output Files: contains the description of each output file. If you roll-over your mouse cursor to the file name, you will get the exact location of the file on the server. If you click on the file name, you will have the option to view up to 999 lines of the raw data. You can also download the file from the server.
Input Files: contains the information of input files. This section lists down all the input files used in the Trim bases task.

Trim Bases Task Report Page

Summary table: gives the total number of reads in each sample, the total number of reads trimmed (i.e. with at least one base trimmed from the read), total number of reads removed (due to Min read length and Max N parameters), the average number of bases trimmed per read, the average read quality before trim bases and finally the average read quality after trim bases.
Stacked bar-chart: shows percentage of untrimmed reads, trimmed reads and removed reads are shown in a stacked bar-chart to compare all the samples.
Average base quality score per position of trimmed reads: shows the average base quality score at each position of the trimmed reads for all samples in the project.

Trim Bases Output Files

References

Del Fabbro C, Scalabrin S, Moragante M, Giorgi FM. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS ONE. 2013; 8(12): e85024.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.none;">43rates

Trim tags

What is Trim tags?
Running Trim tags
Building a custom prep kit

What is Trim tags?

The Trim tags task allows you to process unaligned read data with adaptors, barcodes, and UMIs using a Prep kit file that specifies the configuration of these elements in your NGS reads.

Running Trim tags

Click an Unaligned reads data node
Click the Pre-alignment QA/QC section of the toolbox
Click Trim tags

There are three parameters to configure - Prep kit, Keep untrimmed, and Map feature barcodes.

Partek distributes prep kits for processing several types of data:

10x Chromium Single Cell 3' v2
10x Chromium Single Cell 3' v3
10x Chromium Single Cell 5'
Drop-seq

Choose a Prep kit from the drop-down menu
Click Finish to run Trim tags (Figure 1)

The output of Trim tags is a Trimmed reads data node. An additional Untrimmed reads data node will be generated if the Keep untrimmed option was selected.

Building a custom prep kit

Select Other / Custom from the Prep kit name drop-down menu
Give the new prep kit a name
Choose Build prep kit

You can select Import prep kit if you have a Prep kit .zip file downloaded from Partek Flow.

Click Create (Figure 3)

The Prep kit builder interface will load (Figure 4).

There are three sections:

Figure 5. Paired end prep kits have first and second mate segmentation sections

Segmentation - this is where you will describe the structure of your reads

Click to add a segment.

Segments include adaptors, barcodes, UMIs, and the insert (i.e., the target sequence of the assay)

Adaptors

For adaptors, you have the option of choosing a file with your adaptor sequences or entering the adaptor sequences manually.

To use a file, choose File for Sequences and then click Choose File (Figure 6). Use the file browser to choose a FASTA file from your local computer.